Uek5 master driver update by be2net · Pull Request #5 · oracle/linux-uek

be2net · 2018-11-29T17:02:54Z

be2net patch set

Lancer HW cannot handle a TSO packet with a single segment. Disable TSO/GSO for such packets. Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>

If the driver receives a TX CQE with status as 0x1 or 0x9 or 0xb, the completion indexes should not be used. The driver must stop consuming CQEs from this TXQ/CQ. The TXQ from this point on-wards to be in a bad state. Driver should destroy and recreate the TXQ. 0x1: LANCER_TX_COMP_LSO_ERR 0x9 LANCER_TX_COMP_SGE_ERR 0xb: LANCER_TX_COMP_PARITY_ERR Reset the adapter if driver sees this error in TX completion. Also adding sge error counter in ethtool stats. Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>

gregmarsden · 2018-11-30T15:00:11Z

Thank you for your submission to UEK! We are reviewing the requested changes.

Per our kernel review and merge process, we are closing this pull request, but that doesn't mean it's not being reviewed.

[ Upstream commit 46b3722 ] We occasionaly hit following assert failure in 'perf top', when processing the /proc info in multiple threads. perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: [Switching to Thread 0x7ffff11ba700 (LWP 13749)] 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 #1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 #2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000535373 in refcount_inc (r=0x7fffdc009be0) at ...include/linux/refcount.h:109 #5 0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0) at util/comm.c:24 #6 0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:72 #7 0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:95 #8 0x000000000053582e in comm__new (str=0x7fffd000b260 ":2", timestamp=0, exec=false) at util/comm.c:111 #9 0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57 #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457 #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... The problem is that we keep global comm_str_root list, which is accessed by multiple threads during the 'perf top' startup and following 2 paths can race: thread 1: ... thread__new comm__new comm_str__findnew down_write(&comm_str_lock); __comm_str__findnew comm_str__get thread 2: ... comm__override or comm__free comm_str__put refcount_dec_and_test down_write(&comm_str_lock); rb_erase(&cs->rb_node, &comm_str_root); Because thread 2 first decrements the refcnt and only after then it removes the struct comm_str from the list, the thread 1 can find this object on the list with refcnt equls to 0 and hit the assert. This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects that already dropped the refcnt to 0. For the rest of the objects we take the refcnt before comparing its name and release it afterwards with comm_str__put, which can also release the object completely. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3e1a127 upstream. When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…error The sequence that leads to this state is as follows. 1) First we see CQ error logged. Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core 0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2 vendor_error_syndrome=0x0 2) That is followed by the drop of the associated RDS connection. Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection <192.168.54.43,192.168.54.1,0> dropped due to 'qp event' 3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that. 4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection. crash64> bt 62577 PID: 62577 TASK: ffff88143f045400 CPU: 4 COMMAND: "kworker/u224:1" #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b #1 [ffff8813663bbbb0] schedule at ffffffff816abca7 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma] #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds] #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds] #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b #8 [ffff8813663bbec0] kthread at ffffffff810a304b #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752 crash64> It was stuck here in rds_ib_conn_shutdown for ever: /* quiesce tx and rx completion before tearing down */ while (!wait_event_timeout(rds_ib_ring_empty_wait, rds_ib_ring_empty(&ic->i_recv_ring) && (atomic_read(&ic->i_signaled_sends) == 0), msecs_to_jiffies(5000))) { /* Try to reap pending RX completions every 5 secs */ if (!rds_ib_ring_empty(&ic->i_recv_ring)) { spin_lock_bh(&ic->i_rx_lock); rds_ib_rx(ic); spin_unlock_bh(&ic->i_rx_lock); } } The recv ring was not empty. w_alloc_ptr = 560 w_free_ptr = 256 This is what Mellanox had to say: When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will generate Async event to notify this error, also the QPs that tries to access this CQ will be put to error state but will not be flushed since we must not post CQEs to a broken CQ. The QP that tries to access will also issue an Async catas event. In summary we cannot wait for any more WR_FLUSH_ERRs in that state. Orabug: 29180452 Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

The sequence that leads to this state is as follows. 1) First we see CQ error logged. Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core 0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2 vendor_error_syndrome=0x0 2) That is followed by the drop of the associated RDS connection. Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection <192.168.54.43,192.168.54.1,0> dropped due to 'qp event' 3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that. 4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection. crash64> bt 62577 PID: 62577 TASK: ffff88143f045400 CPU: 4 COMMAND: "kworker/u224:1" #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b #1 [ffff8813663bbbb0] schedule at ffffffff816abca7 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma] #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds] #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds] #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b #8 [ffff8813663bbec0] kthread at ffffffff810a304b #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752 crash64> It was stuck here in rds_ib_conn_shutdown for ever: /* quiesce tx and rx completion before tearing down */ while (!wait_event_timeout(rds_ib_ring_empty_wait, rds_ib_ring_empty(&ic->i_recv_ring) && (atomic_read(&ic->i_signaled_sends) == 0), msecs_to_jiffies(5000))) { /* Try to reap pending RX completions every 5 secs */ if (!rds_ib_ring_empty(&ic->i_recv_ring)) { spin_lock_bh(&ic->i_rx_lock); rds_ib_rx(ic); spin_unlock_bh(&ic->i_rx_lock); } } The recv ring was not empty. w_alloc_ptr = 560 w_free_ptr = 256 This is what Mellanox had to say: When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will generate Async event to notify this error, also the QPs that tries to access this CQ will be put to error state but will not be flushed since we must not post CQEs to a broken CQ. The QP that tries to access will also issue an Async catas event. In summary we cannot wait for any more WR_FLUSH_ERRs in that state. Orabug: 28733324 Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>

Refactor the code and export the build of the matched flow groups list to separate function. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Orabug: 28571062 (cherry picked from commit 46719d7) cherry-pick-repo=kernel/git/torvalds/linux.git Conflicts: drivers/net/ethernet/mellanox/mlx5/core/fs_core.c Conflict was caused by missing code semantics changes made in core/fs_core.c back in v4.15 (b92af5a). However, those changes in the commit (b92af5a) will be overwritten later by patch #5 (net/mlx5: Support multiple updates of steering rules in parallel) in this series. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Aron Silverton <aron.silverton@oracle.com>

…error The sequence that leads to this state is as follows. 1) First we see CQ error logged. Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core 0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2 vendor_error_syndrome=0x0 2) That is followed by the drop of the associated RDS connection. Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection <192.168.54.43,192.168.54.1,0> dropped due to 'qp event' 3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that. 4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection. crash64> bt 62577 PID: 62577 TASK: ffff88143f045400 CPU: 4 COMMAND: "kworker/u224:1" #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b #1 [ffff8813663bbbb0] schedule at ffffffff816abca7 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma] #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds] #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds] #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b #8 [ffff8813663bbec0] kthread at ffffffff810a304b #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752 crash64> It was stuck here in rds_ib_conn_shutdown for ever: /* quiesce tx and rx completion before tearing down */ while (!wait_event_timeout(rds_ib_ring_empty_wait, rds_ib_ring_empty(&ic->i_recv_ring) && (atomic_read(&ic->i_signaled_sends) == 0), msecs_to_jiffies(5000))) { /* Try to reap pending RX completions every 5 secs */ if (!rds_ib_ring_empty(&ic->i_recv_ring)) { spin_lock_bh(&ic->i_rx_lock); rds_ib_rx(ic); spin_unlock_bh(&ic->i_rx_lock); } } The recv ring was not empty. w_alloc_ptr = 560 w_free_ptr = 256 This is what Mellanox had to say: When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will generate Async event to notify this error, also the QPs that tries to access this CQ will be put to error state but will not be flushed since we must not post CQEs to a broken CQ. The QP that tries to access will also issue an Async catas event. In summary we cannot wait for any more WR_FLUSH_ERRs in that state. Orabug: 29180514 Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

The customer hit this crash few times. PID: 31556 TASK: ffff880f823caa00 CPU: 1 COMMAND: "cellsrv" #0 [ffff880f823db850] machine_kexec at ffffffff8105d93c #1 [ffff880f823db8b0] crash_kexec at ffffffff811103b3 #2 [ffff880f823db980] oops_end at ffffffff8101a788 #3 [ffff880f823db9b0] no_context at ffffffff8106b9cf #4 [ffff880f823dba20] __bad_area_nosemaphore at ffffffff8106bc9d #5 [ffff880f823dba70] bad_area at ffffffff8106be97 #6 [ffff880f823dbaa0] __do_page_fault at ffffffff8106c71e #7 [ffff880f823dbb00] do_page_fault at ffffffff8106c81f #8 [ffff880f823dbb40] page_fault at ffffffff816b5a9f [exception RIP: rds_ib_inc_copy_to_user+104] RIP: ffffffffa04607b8 RSP: ffff880f823dbbf8 RFLAGS: 00010287 RAX: 0000000000000340 RBX: 0000000000001000 RCX: 0000000000004000 RDX: 0000000000001000 RSI: ffff88176cea2000 RDI: ffff8817d291f520 RBP: ffff880f823dbc48 R8: 0000000000001340 R9: 0000000000001000 R10: 0000000000001200 R11: ffff880f823dc000 R12: ffff880f823dbed0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff880f823dbc50] rds_recvmsg at ffffffffa041d837 [rds] int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to) ... ... ibinc = container_of(inc, struct rds_ib_incoming, ii_inc); frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item); len = be32_to_cpu(inc->i_hdr.h_len); sg = frag->f_sg; while (iov_iter_count(to) && copied < len) { to_copy = min_t(unsigned long, iov_iter_count(to), sg->length - frag_off); ... sg is NULL and it crashes accessing sg->length above. The cause looks like is due to ic->i_frag_sz returning incorrect value. 16KB when 4KB was expected. if (copied % ic->i_frag_sz == 0) { frag = list_entry(frag->f_item.next, struct rds_page_frag, f_item); frag_off = 0; sg = frag->f_sg; } The other end is using 4KB RDS fragsize (Solaris Super Cluster). This end is UEK4 (4.1.12-94.8.4.el6uek.x86_64). The message being copied arrived over 4KB RDS frag size connection. But during the above check ic->i_frag_sz is 16KB. This can happen during a reconnect at the connection setup phase. We start off with ic->i_frag_sz as 16KB. Then settle down at 4KB. Failing this check if (copied % ic->i_frag_sz == 0) { can result in sg not getting set correctly. Say, "copied" = 4KB but ic->i_frag_sz is 16KB when it should be 4KB. During race condition with a reconnect, ic->i_frag_sz can be 16KB even though once the connection is set up it settled down to 4KB. It can change from 4KB to 16KB and back to 4KB during connection setup due to reconnect. We started seeing this crash after bug 26848749. But prior to that the same scenario could result in data copied to user from incorrect "sg" resulting in data corruption. Orabug: 28748049 Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

…error The sequence that leads to this state is as follows. 1) First we see CQ error logged. Sep 29 22:32:33 dm54cel14 kernel: [471472.784371] mlx4_core 0000:46:00.0: CQ access violation on CQN 000419 syndrome=0x2 vendor_error_syndrome=0x0 2) That is followed by the drop of the associated RDS connection. Sep 29 22:32:33 dm54cel14 kernel: [471472.784403] RDS/IB: connection <192.168.54.43,192.168.54.1,0> dropped due to 'qp event' 3) We don't get the WR_FLUSH_ERRs for the posted receive buffers after that. 4) RDS is stuck in rds_ib_conn_shutdown while shutting down that connection. crash64> bt 62577 PID: 62577 TASK: ffff88143f045400 CPU: 4 COMMAND: "kworker/u224:1" #0 [ffff8813663bbb58] __schedule at ffffffff816ab68b #1 [ffff8813663bbbb0] schedule at ffffffff816abca7 #2 [ffff8813663bbbd0] schedule_timeout at ffffffff816aee71 #3 [ffff8813663bbc80] rds_ib_conn_shutdown at ffffffffa041f7d1 [rds_rdma] #4 [ffff8813663bbd10] rds_conn_shutdown at ffffffffa03dc6e2 [rds] #5 [ffff8813663bbdb0] rds_shutdown_worker at ffffffffa03e2699 [rds] #6 [ffff8813663bbe00] process_one_work at ffffffff8109cda1 #7 [ffff8813663bbe50] worker_thread at ffffffff8109d92b #8 [ffff8813663bbec0] kthread at ffffffff810a304b #9 [ffff8813663bbf50] ret_from_fork at ffffffff816b0752 crash64> It was stuck here in rds_ib_conn_shutdown for ever: /* quiesce tx and rx completion before tearing down */ while (!wait_event_timeout(rds_ib_ring_empty_wait, rds_ib_ring_empty(&ic->i_recv_ring) && (atomic_read(&ic->i_signaled_sends) == 0), msecs_to_jiffies(5000))) { /* Try to reap pending RX completions every 5 secs */ if (!rds_ib_ring_empty(&ic->i_recv_ring)) { spin_lock_bh(&ic->i_rx_lock); rds_ib_rx(ic); spin_unlock_bh(&ic->i_rx_lock); } } The recv ring was not empty. w_alloc_ptr = 560 w_free_ptr = 256 This is what Mellanox had to say: When CQ moves to error (e.g. due to CQ Overrun, CQ Access violation) FW will generate Async event to notify this error, also the QPs that tries to access this CQ will be put to error state but will not be flushed since we must not post CQEs to a broken CQ. The QP that tries to access will also issue an Async catas event. In summary we cannot wait for any more WR_FLUSH_ERRs in that state. Orabug: 29180535 Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

commit c495144 upstream. We're getting a lockdep splat because we take the dio_sem under the log_mutex. What we really need is to protect fsync() from logging an extent map for an extent we never waited on higher up, so just guard the whole thing with dio_sem. ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted ------------------------------------------------------ aio-dio-invalid/30928 is trying to acquire lock: 0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0 but task is already holding lock: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&ei->dio_sem){++++}: lock_acquire+0xbd/0x220 down_write+0x51/0xb0 btrfs_log_changed_extents+0x80/0xa40 btrfs_log_inode+0xbaf/0x1000 btrfs_log_inode_parent+0x26f/0xa80 btrfs_log_dentry_safe+0x50/0x70 btrfs_sync_file+0x357/0x540 do_fsync+0x38/0x60 __ia32_sys_fdatasync+0x12/0x20 do_fast_syscall_32+0x9a/0x2f0 entry_SYSENTER_compat+0x84/0x96 -> #4 (&ei->log_mutex){+.+.}: lock_acquire+0xbd/0x220 __mutex_lock+0x86/0xa10 btrfs_record_unlink_dir+0x2a/0xa0 btrfs_unlink+0x5a/0xc0 vfs_unlink+0xb1/0x1a0 do_unlinkat+0x264/0x2b0 do_fast_syscall_32+0x9a/0x2f0 entry_SYSENTER_compat+0x84/0x96 -> #3 (sb_internal#2){.+.+}: lock_acquire+0xbd/0x220 __sb_start_write+0x14d/0x230 start_transaction+0x3e6/0x590 btrfs_evict_inode+0x475/0x640 evict+0xbf/0x1b0 btrfs_run_delayed_iputs+0x6c/0x90 cleaner_kthread+0x124/0x1a0 kthread+0x106/0x140 ret_from_fork+0x3a/0x50 -> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}: lock_acquire+0xbd/0x220 __mutex_lock+0x86/0xa10 btrfs_alloc_data_chunk_ondemand+0x197/0x530 btrfs_check_data_free_space+0x4c/0x90 btrfs_delalloc_reserve_space+0x20/0x60 btrfs_page_mkwrite+0x87/0x520 do_page_mkwrite+0x31/0xa0 __handle_mm_fault+0x799/0xb00 handle_mm_fault+0x7c/0xe0 __do_page_fault+0x1d3/0x4a0 async_page_fault+0x1e/0x30 -> #1 (sb_pagefaults){.+.+}: lock_acquire+0xbd/0x220 __sb_start_write+0x14d/0x230 btrfs_page_mkwrite+0x6a/0x520 do_page_mkwrite+0x31/0xa0 __handle_mm_fault+0x799/0xb00 handle_mm_fault+0x7c/0xe0 __do_page_fault+0x1d3/0x4a0 async_page_fault+0x1e/0x30 -> #0 (&mm->mmap_sem){++++}: __lock_acquire+0x42e/0x7a0 lock_acquire+0xbd/0x220 down_read+0x48/0xb0 get_user_pages_unlocked+0x5a/0x1e0 get_user_pages_fast+0xa4/0x150 iov_iter_get_pages+0xc3/0x340 do_direct_IO+0xf93/0x1d70 __blockdev_direct_IO+0x32d/0x1c20 btrfs_direct_IO+0x227/0x400 generic_file_direct_write+0xcf/0x180 btrfs_file_write_iter+0x308/0x58c aio_write+0xf8/0x1d0 io_submit_one+0x3a9/0x620 __ia32_compat_sys_io_submit+0xb2/0x270 do_int80_syscall_32+0x5b/0x1a0 entry_INT80_compat+0x88/0xa0 other info that might help us debug this: Chain exists of: &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->dio_sem); lock(&ei->log_mutex); lock(&ei->dio_sem); lock(&mm->mmap_sem); *** DEADLOCK *** 1 lock held by aio-dio-invalid/30928: #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400 stack backtrace: CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Call Trace: dump_stack+0x7c/0xbb print_circular_bug.isra.37+0x297/0x2a4 check_prev_add.constprop.45+0x781/0x7a0 ? __lock_acquire+0x42e/0x7a0 validate_chain.isra.41+0x7f0/0xb00 __lock_acquire+0x42e/0x7a0 lock_acquire+0xbd/0x220 ? get_user_pages_unlocked+0x5a/0x1e0 down_read+0x48/0xb0 ? get_user_pages_unlocked+0x5a/0x1e0 get_user_pages_unlocked+0x5a/0x1e0 get_user_pages_fast+0xa4/0x150 iov_iter_get_pages+0xc3/0x340 do_direct_IO+0xf93/0x1d70 ? __alloc_workqueue_key+0x358/0x490 ? __blockdev_direct_IO+0x14b/0x1c20 __blockdev_direct_IO+0x32d/0x1c20 ? btrfs_run_delalloc_work+0x40/0x40 ? can_nocow_extent+0x490/0x490 ? kvm_clock_read+0x1f/0x30 ? can_nocow_extent+0x490/0x490 ? btrfs_run_delalloc_work+0x40/0x40 btrfs_direct_IO+0x227/0x400 ? btrfs_run_delalloc_work+0x40/0x40 generic_file_direct_write+0xcf/0x180 btrfs_file_write_iter+0x308/0x58c aio_write+0xf8/0x1d0 ? kvm_clock_read+0x1f/0x30 ? __might_fault+0x3e/0x90 io_submit_one+0x3a9/0x620 ? io_submit_one+0xe5/0x620 __ia32_compat_sys_io_submit+0xb2/0x270 do_int80_syscall_32+0x5b/0x1a0 entry_INT80_compat+0x88/0xa0 CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit ebaf39e ] The *_frag_reasm() functions are susceptible to miscalculating the byte count of packet fragments in case the truesize of a head buffer changes. The truesize member may be changed by the call to skb_unclone(), leaving the fragment memory limit counter unbalanced even if all fragments are processed. This miscalculation goes unnoticed as long as the network namespace which holds the counter is not destroyed. Should an attempt be made to destroy a network namespace that holds an unbalanced fragment memory limit counter the cleanup of the namespace never finishes. The thread handling the cleanup gets stuck in inet_frags_exit_net() waiting for the percpu counter to reach zero. The thread is usually in running state with a stacktrace similar to: PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 #10 [ffff880621563e38] process_one_work at ffffffff81096f14 It is not possible to create new network namespaces, and processes that call unshare() end up being stuck in uninterruptible sleep state waiting to acquire the net_mutex. The bug was observed in the IPv6 netfilter code by Per Sundstrom. I thank him for his analysis of the problem. The parts of this patch that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Reported-by: Per Sundstrom <per.sundstrom@redqube.se> Acked-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit c5a94f4 ] It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] #10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756 #11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa #12 [ffff881ff43eff18] sys_read at ffffffff811fda62 #13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This work around should be reverted when upstream commit (d8b91dd Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) is available in uek. Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this issue. The lask known commit on the perf topic branch that solve this issue is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125) perf trace beauty flock: Move to separate object file). Without this commit the perf topic branch has the below issue. With this commit the branch does not have the issue. Issue is that the above commit does not fix the issue on top of upstream tag 4.15.0. So the issue is probably fixed by this commit and some additional commits on the perf topic branch *or/and* on master branch below the point that the perf branch was branched. Also this specific commit is not a fix and the only possible relation to this bug is that it touches the 'flock' code which is used by bash/scripts to synchronize. To find the additional commits via git bisect I need to re-order the commits so that the above commit will be *below* the other commits that solve this issue. To do that I need to know what's the lowest commit that relate to this fix. I do not know and have no way to know that. Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons of merge conflicts as uek5 is way behind the upstream. So can't even know if the topic branch with it's ~270 commits fix this issue for uek5. So I chose to work-around the issue and wait for the upstream topic merge to obsolite this commit. When issue occuer: Serial is flooded with messages: [71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms Then panic occur: [71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds. [71266.695759] Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2 [71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [71266.695761] NetworkManager D 0 5837 1 0x00000082 [71266.695765] Call Trace: [71266.695778] __schedule+0x2bc/0x8da [71266.695782] schedule+0x36/0x7c [71266.695785] schedule_preempt_disabled+0xe/0x10 [71266.695788] __mutex_lock.isra.5+0x20c/0x634 [71266.695792] __mutex_lock_slowpath+0x13/0x15 [71266.695794] mutex_lock+0x2f/0x3a [71266.695800] rtnetlink_rcv_msg+0x1d0/0x289 [71266.695806] ? __skb_try_recv_datagram+0xca/0x174 [71266.695809] ? rtnl_calcit.isra.25+0x110/0x103 [71266.695812] netlink_rcv_skb+0xdf/0x111 [71266.695816] rtnetlink_rcv+0x15/0x17 [71266.695818] netlink_unicast+0x18d/0x255 [71266.695820] netlink_sendmsg+0x2df/0x3cc [71266.695825] sock_sendmsg+0x3e/0x4a [71266.695828] ___sys_sendmsg+0x2b5/0x2c6 [71266.695832] ? arch_tlb_finish_mmu+0x1b/0xcb [71266.695835] ? tlb_finish_mmu+0x23/0x30 [71266.695838] ? unmap_region+0xf4/0x12d [71266.695844] ? lockref_put_or_lock+0x44/0x72 [71266.695846] ? __vma_rb_erase+0x10f/0x1f4 [71266.695850] __sys_sendmsg+0x54/0x8d [71266.695854] SyS_sendmsg+0x12/0x1c [71266.695860] do_syscall_64+0x79/0x1ae [71266.695864] entry_SYSCALL_64_after_hwframe+0x151/0x0 [71266.695866] RIP: 0033:0x7f16f2553c5d [71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d [71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007 [71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000 [71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380 [71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70 Issue analysis: The ip process is hung in addrconf_notify while trying to print to serial one of the below messages: "ADDRCONF(NETDEV_UP): %s: link is not ready\n" "ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n" The ip process hold the rtnl_lock while network-manager process try to grab this lock in 1 msec loop and every time it fail to grab the lock, the network-manager send additional line to the serial log as seen in the dmesg: "bondib0: link status up for interface ib0, enabling it in 0 ms" So the bond device flood the serial while waiting for the rtnl_lock while ip hold the rtnl_lock while waiting for the serial. Offending stack trace from vmcore is this: PID: 30063 TASK: ffff909c3f675a00 CPU: 7 COMMAND: "ip" #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750 [exception RIP: delay_tsc+51] RIP: ffffffff8e8558f3 RSP: ffff9f63c6c07390 RFLAGS: 00000046 RAX: 0000000016d23977 RBX: ffffffff903fbc00 RCX: 00009b7616d23038 RDX: 0000000000009b76 RSI: 0000000000000007 RDI: 000000000000095a RBP: ffff9f63c6c07390 R8: 00000000fffffffe R9: 0000000000000000 R10: 0000000000000005 R11: 0000000000020503 R12: 000000000000261f R13: 0000000000000020 R14: ffffffff8f96de2f R15: ffffffff903fbc00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 <NMI exception stack> #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247 RIP: 00007faf75ccafd0 RSP: 00007ffc710a9368 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000005c65f66d RCX: 00007faf75ccafd0 RDX: 0000000000000000 RSI: 00007ffc710a93b0 RDI: 0000000000000003 RBP: 00007ffc710a93b0 R8: 0000000000000000 R9: 0000000000000008 R10: 00007ffc710a8f30 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000066a440 R14: 00007ffc710a9458 R15: 00007ffc710a9b88 ORIG_RAX: 000000000000002e CS: 0033 SS: 002b Orabug: 29357838 Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Reviewed-by: John Haxby <john.haxby@oracle.com>

This work around should be reverted when upstream commit (d8b91dd Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) is available in uek. Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this issue. The lask known commit on the perf topic branch that solve this issue is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125) perf trace beauty flock: Move to separate object file). Without this commit the perf topic branch has the below issue. With this commit the branch does not have the issue. Issue is that the above commit does not fix the issue on top of upstream tag 4.15.0. So the issue is probably fixed by this commit and some additional commits on the perf topic branch *or/and* on master branch below the point that the perf branch was branched. Also this specific commit is not a fix and the only possible relation to this bug is that it touches the 'flock' code which is used by bash/scripts to synchronize. To find the additional commits via git bisect I need to re-order the commits so that the above commit will be *below* the other commits that solve this issue. To do that I need to know what's the lowest commit that relate to this fix. I do not know and have no way to know that. Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons of merge conflicts as uek5 is way behind the upstream. So can't even know if the topic branch with it's ~270 commits fix this issue for uek5. So I chose to work-around the issue and wait for the upstream topic merge to obsolite this commit. When issue occuer: Serial is flooded with messages: [71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms Then panic occur: [71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds. [71266.695759] Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2 [71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [71266.695761] NetworkManager D 0 5837 1 0x00000082 [71266.695765] Call Trace: [71266.695778] __schedule+0x2bc/0x8da [71266.695782] schedule+0x36/0x7c [71266.695785] schedule_preempt_disabled+0xe/0x10 [71266.695788] __mutex_lock.isra.5+0x20c/0x634 [71266.695792] __mutex_lock_slowpath+0x13/0x15 [71266.695794] mutex_lock+0x2f/0x3a [71266.695800] rtnetlink_rcv_msg+0x1d0/0x289 [71266.695806] ? __skb_try_recv_datagram+0xca/0x174 [71266.695809] ? rtnl_calcit.isra.25+0x110/0x103 [71266.695812] netlink_rcv_skb+0xdf/0x111 [71266.695816] rtnetlink_rcv+0x15/0x17 [71266.695818] netlink_unicast+0x18d/0x255 [71266.695820] netlink_sendmsg+0x2df/0x3cc [71266.695825] sock_sendmsg+0x3e/0x4a [71266.695828] ___sys_sendmsg+0x2b5/0x2c6 [71266.695832] ? arch_tlb_finish_mmu+0x1b/0xcb [71266.695835] ? tlb_finish_mmu+0x23/0x30 [71266.695838] ? unmap_region+0xf4/0x12d [71266.695844] ? lockref_put_or_lock+0x44/0x72 [71266.695846] ? __vma_rb_erase+0x10f/0x1f4 [71266.695850] __sys_sendmsg+0x54/0x8d [71266.695854] SyS_sendmsg+0x12/0x1c [71266.695860] do_syscall_64+0x79/0x1ae [71266.695864] entry_SYSCALL_64_after_hwframe+0x151/0x0 [71266.695866] RIP: 0033:0x7f16f2553c5d [71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d [71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007 [71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000 [71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380 [71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70 Issue analysis: The ip process is hung in addrconf_notify while trying to print to serial one of the below messages: "ADDRCONF(NETDEV_UP): %s: link is not ready\n" "ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n" The ip process hold the rtnl_lock while network-manager process try to grab this lock in 1 msec loop and every time it fail to grab the lock, the network-manager send additional line to the serial log as seen in the dmesg: "bondib0: link status up for interface ib0, enabling it in 0 ms" So the bond device flood the serial while waiting for the rtnl_lock while ip hold the rtnl_lock while waiting for the serial. Offending stack trace from vmcore is this: PID: 30063 TASK: ffff909c3f675a00 CPU: 7 COMMAND: "ip" #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750 [exception RIP: delay_tsc+51] RIP: ffffffff8e8558f3 RSP: ffff9f63c6c07390 RFLAGS: 00000046 RAX: 0000000016d23977 RBX: ffffffff903fbc00 RCX: 00009b7616d23038 RDX: 0000000000009b76 RSI: 0000000000000007 RDI: 000000000000095a RBP: ffff9f63c6c07390 R8: 00000000fffffffe R9: 0000000000000000 R10: 0000000000000005 R11: 0000000000020503 R12: 000000000000261f R13: 0000000000000020 R14: ffffffff8f96de2f R15: ffffffff903fbc00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 <NMI exception stack> #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247 RIP: 00007faf75ccafd0 RSP: 00007ffc710a9368 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000005c65f66d RCX: 00007faf75ccafd0 RDX: 0000000000000000 RSI: 00007ffc710a93b0 RDI: 0000000000000003 RBP: 00007ffc710a93b0 R8: 0000000000000000 R9: 0000000000000008 R10: 00007ffc710a8f30 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000066a440 R14: 00007ffc710a9458 R15: 00007ffc710a9b88 ORIG_RAX: 000000000000002e CS: 0033 SS: 002b Orabug: 29016284 Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: John Haxby <john.haxby@oracle.com>

Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the function called from __sanitizer_cov_trace_const_cmp4 shouldn't be traceable either. ftrace_graph_caller() gets called every time func write_comp_data() gets called if it isn't marked 'notrace'. This is the backtrace from gdb: #0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179 #1 0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151 #2 0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116 #3 0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188 #4 0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27 #5 0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182 Rework so that write_comp_data() that are called from __sanitizer_cov_trace_*_cmp*() are marked as 'notrace'. Commit 903e8ff ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace") missed to mark write_comp_data() as 'notrace'. When that patch was created gcc-7 was used. In lib/Kconfig.debug config KCOV_ENABLE_COMPARISONS depends on $(cc-option,-fsanitize-coverage=trace-cmp) That code path isn't hit with gcc-7. However, it were that with gcc-8. Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Co-developed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6347244) Orabug: 29558684 Signed-off-by: George Kennedy <george.kennedy@oracle.com> Reviewed-by: John Donnelly <john.p.donnelly@oracle.com>

This work around should be reverted when upstream commit (d8b91dd Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) is available in uek. Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this issue. The lask known commit on the perf topic branch that solve this issue is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125) perf trace beauty flock: Move to separate object file). Without this commit the perf topic branch has the below issue. With this commit the branch does not have the issue. Issue is that the above commit does not fix the issue on top of upstream tag 4.15.0. So the issue is probably fixed by this commit and some additional commits on the perf topic branch *or/and* on master branch below the point that the perf branch was branched. Also this specific commit is not a fix and the only possible relation to this bug is that it touches the 'flock' code which is used by bash/scripts to synchronize. To find the additional commits via git bisect I need to re-order the commits so that the above commit will be *below* the other commits that solve this issue. To do that I need to know what's the lowest commit that relate to this fix. I do not know and have no way to know that. Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons of merge conflicts as uek5 is way behind the upstream. So can't even know if the topic branch with it's ~270 commits fix this issue for uek5. So I chose to work-around the issue and wait for the upstream topic merge to obsolite this commit. When issue occuer: Serial is flooded with messages: [71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms Then panic occur: [71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds. [71266.695759] Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2 [71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [71266.695761] NetworkManager D 0 5837 1 0x00000082 [71266.695765] Call Trace: [71266.695778] __schedule+0x2bc/0x8da [71266.695782] schedule+0x36/0x7c [71266.695785] schedule_preempt_disabled+0xe/0x10 [71266.695788] __mutex_lock.isra.5+0x20c/0x634 [71266.695792] __mutex_lock_slowpath+0x13/0x15 [71266.695794] mutex_lock+0x2f/0x3a [71266.695800] rtnetlink_rcv_msg+0x1d0/0x289 [71266.695806] ? __skb_try_recv_datagram+0xca/0x174 [71266.695809] ? rtnl_calcit.isra.25+0x110/0x103 [71266.695812] netlink_rcv_skb+0xdf/0x111 [71266.695816] rtnetlink_rcv+0x15/0x17 [71266.695818] netlink_unicast+0x18d/0x255 [71266.695820] netlink_sendmsg+0x2df/0x3cc [71266.695825] sock_sendmsg+0x3e/0x4a [71266.695828] ___sys_sendmsg+0x2b5/0x2c6 [71266.695832] ? arch_tlb_finish_mmu+0x1b/0xcb [71266.695835] ? tlb_finish_mmu+0x23/0x30 [71266.695838] ? unmap_region+0xf4/0x12d [71266.695844] ? lockref_put_or_lock+0x44/0x72 [71266.695846] ? __vma_rb_erase+0x10f/0x1f4 [71266.695850] __sys_sendmsg+0x54/0x8d [71266.695854] SyS_sendmsg+0x12/0x1c [71266.695860] do_syscall_64+0x79/0x1ae [71266.695864] entry_SYSCALL_64_after_hwframe+0x151/0x0 [71266.695866] RIP: 0033:0x7f16f2553c5d [71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d [71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007 [71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000 [71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380 [71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70 Issue analysis: The ip process is hung in addrconf_notify while trying to print to serial one of the below messages: "ADDRCONF(NETDEV_UP): %s: link is not ready\n" "ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n" The ip process hold the rtnl_lock while network-manager process try to grab this lock in 1 msec loop and every time it fail to grab the lock, the network-manager send additional line to the serial log as seen in the dmesg: "bondib0: link status up for interface ib0, enabling it in 0 ms" So the bond device flood the serial while waiting for the rtnl_lock while ip hold the rtnl_lock while waiting for the serial. Offending stack trace from vmcore is this: PID: 30063 TASK: ffff909c3f675a00 CPU: 7 COMMAND: "ip" #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750 [exception RIP: delay_tsc+51] RIP: ffffffff8e8558f3 RSP: ffff9f63c6c07390 RFLAGS: 00000046 RAX: 0000000016d23977 RBX: ffffffff903fbc00 RCX: 00009b7616d23038 RDX: 0000000000009b76 RSI: 0000000000000007 RDI: 000000000000095a RBP: ffff9f63c6c07390 R8: 00000000fffffffe R9: 0000000000000000 R10: 0000000000000005 R11: 0000000000020503 R12: 000000000000261f R13: 0000000000000020 R14: ffffffff8f96de2f R15: ffffffff903fbc00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 <NMI exception stack> #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247 RIP: 00007faf75ccafd0 RSP: 00007ffc710a9368 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000005c65f66d RCX: 00007faf75ccafd0 RDX: 0000000000000000 RSI: 00007ffc710a93b0 RDI: 0000000000000003 RBP: 00007ffc710a93b0 R8: 0000000000000000 R9: 0000000000000008 R10: 00007ffc710a8f30 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000066a440 R14: 00007ffc710a9458 R15: 00007ffc710a9b88 ORIG_RAX: 000000000000002e CS: 0033 SS: 002b Orabug: 29631452 Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Reviewed-by: John Haxby <john.haxby@oracle.com>

…_map [ Upstream commit 39df730 ] Detected via gcc's ASan: Direct leak of 2048 byte(s) in 64 object(s) allocated from: 6 #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370) 7 #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43 8 #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85 9 #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250 10 #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382 11 #5 0x556b0f0e3231 in print_events util/parse-events.c:2514 12 #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 13 #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 14 #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 15 #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 16 #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520 17 #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct") Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 54569ba ] Detected with gcc's ASan: Direct leak of 66 byte(s) in 5 object(s) allocated from: #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x560c8761034d in collect_config util/config.c:597 #2 0x560c8760d9cb in get_value util/config.c:169 #3 0x560c8760dfd7 in perf_parse_file util/config.c:285 #4 0x560c8760e0d2 in perf_config_from_file util/config.c:476 #5 0x560c876108fd in perf_config_set__init util/config.c:661 #6 0x560c87610c72 in perf_config_set__new util/config.c:709 #7 0x560c87610d2f in perf_config__init util/config.c:718 #8 0x560c87610e5d in perf_config util/config.c:730 #9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442 #10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Taeung Song <treeze.taeung@gmail.com> Fixes: 20105ca ("perf config: Introduce perf_config_set class") Link: http://lkml.kernel.org/r/20190316080556.3075-6-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 8bde851 ] Detected with gcc's ASan: Direct leak of 4356 byte(s) in 120 object(s) allocated from: #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215 #2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339 #3 0x55719af66272 in print_events util/parse-events.c:2542 #4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 #5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520 #9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 40218da ("perf list: Show SDT and pre-cached events") Link: http://lkml.kernel.org/r/20190316080556.3075-7-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 42dfa45 ] Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e5330a5e in zalloc util/util.h:23 #2 0x5625e5330a9b in perf_counts__new util/counts.c:10 #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #7 0x5625e51528e6 in run_test tests/builtin-test.c:358 #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #10 0x5625e515572f in cmd_test tests/builtin-test.c:722 #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e532560d in zalloc util/util.h:23 #2 0x5625e532566b in xyarray__new util/xyarray.c:10 #3 0x5625e5330aba in perf_counts__new util/counts.c:15 #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #8 0x5625e51528e6 in run_test tests/builtin-test.c:358 #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #11 0x5625e515572f in cmd_test tests/builtin-test.c:722 #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…_event_on_all_cpus test [ Upstream commit 93faa52 ] ================================================================= ==7497==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45 #2 0x5625e5326703 in cpu_map__read util/cpumap.c:103 #3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120 #4 0x5625e5326915 in cpu_map__new util/cpumap.c:135 #5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36 #6 0x5625e51528e6 in run_test tests/builtin-test.c:358 #7 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #9 0x5625e515572f in cmd_test tests/builtin-test.c:722 #10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object") Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f97a899 ] ================================================================= ==7506==ERROR: LeakSanitizer: detected memory leaks Direct leak of 13 byte(s) in 3 object(s) allocated from: #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x5625e53aaef0 in expr__find_other util/expr.y:221 #2 0x5625e51bcd3f in test__expr tests/expr.c:52 #3 0x5625e51528e6 in run_test tests/builtin-test.c:358 #4 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #6 0x5625e515572f in cmd_test tests/builtin-test.c:722 #7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON") Link: http://lkml.kernel.org/r/20190316080556.3075-16-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit d982b33 ] ================================================================= ==20875==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1160 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x55bd50005599 in zalloc util/util.h:23 #2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327 #3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216 #4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69 #5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358 #6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388 #7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583 #8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722 #9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520 #13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 19 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 01ca667 upstream. Syzkaller report this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G C 5.0.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573 Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00 RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211 __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072 drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934 destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319 __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff If alloc_workqueue fails, it should return -ENOMEM, otherwise may trigger this NULL pointer dereference while unloading drivers. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0a38c17 ("fm10k: Remove create_workqueue") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 7494cec ] Calling kvm_is_visible_gfn() implies that we're parsing the memslots, and doing this without the srcu lock is frown upon: [12704.164532] ============================= [12704.164544] WARNING: suspicious RCU usage [12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G W [12704.164573] ----------------------------- [12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [12704.164602] other info that might help us debug this: [12704.164616] rcu_scheduler_active = 2, debug_locks = 1 [12704.164631] 6 locks held by qemu-system-aar/13968: [12704.164644] #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [12704.164691] #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [12704.164726] #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164761] #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164794] #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164827] #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164861] stack backtrace: [12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G W 5.1.0-rc1-00008-g600025238f51-dirty #16 [12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [12704.164896] Call trace: [12704.164910] dump_backtrace+0x0/0x138 [12704.164920] show_stack+0x24/0x30 [12704.164934] dump_stack+0xbc/0x104 [12704.164946] lockdep_rcu_suspicious+0xcc/0x110 [12704.164958] gfn_to_memslot+0x174/0x190 [12704.164969] kvm_is_visible_gfn+0x28/0x70 [12704.164980] vgic_its_check_id.isra.0+0xec/0x1e8 [12704.164991] vgic_its_save_tables_v0+0x1ac/0x330 [12704.165001] vgic_its_set_attr+0x298/0x3a0 [12704.165012] kvm_device_ioctl_attr+0x9c/0xd8 [12704.165022] kvm_device_ioctl+0x8c/0xf8 [12704.165035] do_vfs_ioctl+0xc8/0x960 [12704.165045] ksys_ioctl+0x8c/0xa0 [12704.165055] __arm64_sys_ioctl+0x28/0x38 [12704.165067] el0_svc_common+0xd8/0x138 [12704.165078] el0_svc_handler+0x38/0x78 [12704.165089] el0_svc+0x8/0xc Make sure the lock is taken when doing this. Fixes: bf30824 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>

commit 01ca667 upstream. Syzkaller report this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G C 5.0.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573 Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00 RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211 __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072 drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934 destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319 __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff If alloc_workqueue fails, it should return -ENOMEM, otherwise may trigger this NULL pointer dereference while unloading drivers. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0a38c17 ("fm10k: Remove create_workqueue") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9b9b0df4e7882638e53c55e8f556aa78915418b9) Orabug: 30322694 CVE: CVE-2019-15924 Reviewed-by: John Donnelly <John.p.donnelly@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Orabug: 30898008 Shrinker hold sb->s_umount lock and invoked .destroy_inode to reclaim inode, if fs was freezed, Shrinker would hung by freeze lock. But unfreeze could never happen because it would be hung by sb->s_umount. Backgroud inode inactivation feature could fix this, but it was not merged by mainline yet, according Darrick, even merged, it would be nearly impossible to backport to 4.14. The effort here is to make a one OFF-MAINLINE fix for uek, if future uek have that feature merged, this patch should be dropped. To avoid deadlock, add inode needing inactivation to list and destroy them async. crash7latest> set 132 PID: 132 COMMAND: "kswapd0:0" TASK: ffff9cdc9dfb5f00 [THREAD_INFO: ffff9cdc9dfb5f00] CPU: 6 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 132 TASK: ffff9cdc9dfb5f00 CPU: 6 COMMAND: "kswapd0:0" #0 [ffffaa5d075bf900] __schedule at ffffffff8186487c #1 [ffffaa5d075bf998] schedule at ffffffff81864e96 #2 [ffffaa5d075bf9b0] rwsem_down_read_failed at ffffffff818689ee #3 [ffffaa5d075bfa40] call_rwsem_down_read_failed at ffffffff81859308 #4 [ffffaa5d075bfa90] __percpu_down_read at ffffffff810ebd38 #5 [ffffaa5d075bfab0] __sb_start_write at ffffffff812859ef #6 [ffffaa5d075bfad0] xfs_trans_alloc at ffffffffc07ebe9c [xfs] #7 [ffffaa5d075bfb18] xfs_free_eofblocks at ffffffffc07c39d1 [xfs] #8 [ffffaa5d075bfb80] xfs_inactive at ffffffffc07de878 [xfs] #9 [ffffaa5d075bfba0] __dta_xfs_fs_destroy_inode_3543 at ffffffffc07e885e [xfs] #10 [ffffaa5d075bfbd0] destroy_inode at ffffffff812a25de #11 [ffffaa5d075bfbe8] evict at ffffffff812a2b73 #12 [ffffaa5d075bfc10] dispose_list at ffffffff812a2c1d #13 [ffffaa5d075bfc38] prune_icache_sb at ffffffff812a421a #14 [ffffaa5d075bfc70] super_cache_scan at ffffffff812870a1 #15 [ffffaa5d075bfcc8] shrink_slab at ffffffff811eebb3 #16 [ffffaa5d075bfdb0] shrink_node at ffffffff811f4788 #17 [ffffaa5d075bfe38] kswapd at ffffffff811f58c3 #18 [ffffaa5d075bff08] kthread at ffffffff810b75d5 #19 [ffffaa5d075bff50] ret_from_fork at ffffffff81a0035e crash7latest> set 31060 PID: 31060 COMMAND: "safefreeze" TASK: ffff9cd292868000 [THREAD_INFO: ffff9cd292868000] CPU: 2 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 31060 TASK: ffff9cd292868000 CPU: 2 COMMAND: "safefreeze" #0 [ffffaa5d10047c90] __schedule at ffffffff8186487c #1 [ffffaa5d10047d28] schedule at ffffffff81864e96 #2 [ffffaa5d10047d40] rwsem_down_write_failed at ffffffff81868f18 #3 [ffffaa5d10047dd8] call_rwsem_down_write_failed at ffffffff81859367 #4 [ffffaa5d10047e20] down_write at ffffffff81867cfd #5 [ffffaa5d10047e38] thaw_super at ffffffff81285d2d #6 [ffffaa5d10047e60] do_vfs_ioctl at ffffffff81299566 #7 [ffffaa5d10047ee8] sys_ioctl at ffffffff81299709 #8 [ffffaa5d10047f28] do_syscall_64 at ffffffff81003949 #9 [ffffaa5d10047f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad RIP: 0000000000453d67 RSP: 00007ffff9c1ce78 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000001cbe92c RCX: 0000000000453d67 RDX: 0000000000000000 RSI: 00000000c0045878 RDI: 0000000000000014 RBP: 00007ffff9c1cf80 R8: 0000000000000000 R9: 0000000000000012 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000401fb0 R13: 0000000000402040 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

Orabug: 30898016 Shrinker hold sb->s_umount lock and invoked .destroy_inode to reclaim inode, if fs was freezed, Shrinker would hung by freeze lock. But unfreeze could never happen because it would be hung by sb->s_umount. Backgroud inode inactivation feature could fix this, but it was not merged by mainline yet, according Darrick, even merged, it would be nearly impossible to backport to 4.14. The effort here is to make a one OFF-MAINLINE fix for uek, if future uek have that feature merged, this patch should be dropped. To avoid deadlock, add inode needing inactivation to list and destroy them async. crash7latest> set 132 PID: 132 COMMAND: "kswapd0:0" TASK: ffff9cdc9dfb5f00 [THREAD_INFO: ffff9cdc9dfb5f00] CPU: 6 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 132 TASK: ffff9cdc9dfb5f00 CPU: 6 COMMAND: "kswapd0:0" #0 [ffffaa5d075bf900] __schedule at ffffffff8186487c #1 [ffffaa5d075bf998] schedule at ffffffff81864e96 #2 [ffffaa5d075bf9b0] rwsem_down_read_failed at ffffffff818689ee #3 [ffffaa5d075bfa40] call_rwsem_down_read_failed at ffffffff81859308 #4 [ffffaa5d075bfa90] __percpu_down_read at ffffffff810ebd38 #5 [ffffaa5d075bfab0] __sb_start_write at ffffffff812859ef #6 [ffffaa5d075bfad0] xfs_trans_alloc at ffffffffc07ebe9c [xfs] #7 [ffffaa5d075bfb18] xfs_free_eofblocks at ffffffffc07c39d1 [xfs] #8 [ffffaa5d075bfb80] xfs_inactive at ffffffffc07de878 [xfs] #9 [ffffaa5d075bfba0] __dta_xfs_fs_destroy_inode_3543 at ffffffffc07e885e [xfs] #10 [ffffaa5d075bfbd0] destroy_inode at ffffffff812a25de #11 [ffffaa5d075bfbe8] evict at ffffffff812a2b73 #12 [ffffaa5d075bfc10] dispose_list at ffffffff812a2c1d #13 [ffffaa5d075bfc38] prune_icache_sb at ffffffff812a421a #14 [ffffaa5d075bfc70] super_cache_scan at ffffffff812870a1 #15 [ffffaa5d075bfcc8] shrink_slab at ffffffff811eebb3 #16 [ffffaa5d075bfdb0] shrink_node at ffffffff811f4788 #17 [ffffaa5d075bfe38] kswapd at ffffffff811f58c3 #18 [ffffaa5d075bff08] kthread at ffffffff810b75d5 #19 [ffffaa5d075bff50] ret_from_fork at ffffffff81a0035e crash7latest> set 31060 PID: 31060 COMMAND: "safefreeze" TASK: ffff9cd292868000 [THREAD_INFO: ffff9cd292868000] CPU: 2 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 31060 TASK: ffff9cd292868000 CPU: 2 COMMAND: "safefreeze" #0 [ffffaa5d10047c90] __schedule at ffffffff8186487c #1 [ffffaa5d10047d28] schedule at ffffffff81864e96 #2 [ffffaa5d10047d40] rwsem_down_write_failed at ffffffff81868f18 #3 [ffffaa5d10047dd8] call_rwsem_down_write_failed at ffffffff81859367 #4 [ffffaa5d10047e20] down_write at ffffffff81867cfd #5 [ffffaa5d10047e38] thaw_super at ffffffff81285d2d #6 [ffffaa5d10047e60] do_vfs_ioctl at ffffffff81299566 #7 [ffffaa5d10047ee8] sys_ioctl at ffffffff81299709 #8 [ffffaa5d10047f28] do_syscall_64 at ffffffff81003949 #9 [ffffaa5d10047f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad RIP: 0000000000453d67 RSP: 00007ffff9c1ce78 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000001cbe92c RCX: 0000000000453d67 RDX: 0000000000000000 RSI: 00000000c0045878 RDI: 0000000000000014 RBP: 00007ffff9c1cf80 R8: 0000000000000000 R9: 0000000000000012 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000401fb0 R13: 0000000000402040 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

commit 42ffb0b upstream. There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: b293f02 ("Btrfs: Add writepages support") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2d5a2f9 upstream. I see the following lockdep splat in the qcom pinctrl driver when attempting to suspend the device. WARNING: possible recursive locking detected 5.4.11 #3 Tainted: G W -------------------------------------------- cat/3074 is trying to acquire lock: ffffff81f49804c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 but task is already holding lock: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by cat/3074: #0: ffffff81f01d9420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4 #1: ffffff81bd7d2080 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc #2: ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc #3: ffffffe411a41d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348 #4: ffffff81f1c5e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c #5: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 stack backtrace: CPU: 5 PID: 3074 Comm: cat Tainted: G W 5.4.11 #3 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xc8/0x124 __lock_acquire+0x460/0x2388 lock_acquire+0x1cc/0x210 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 qpnpint_irq_set_wake+0x28/0x34 set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 pm8941_pwrkey_suspend+0x34/0x44 platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x310/0x41c dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x620 pm_suspend+0x210/0x348 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x15c/0x1fc __vfs_write+0x54/0x18c vfs_write+0xe4/0x1a4 ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_handler+0x7c/0x98 el0_svc+0x8/0xc Set a lockdep class when we map the irq so that irq_set_wake() doesn't warn about a lockdep bug that doesn't exist. Fixes: 12a9eea ("spmi: pmic-arb: convert to v2 irq interfaces to support hierarchical IRQ chips") Cc: Douglas Anderson <dianders@chromium.org> Cc: Brian Masney <masneyb@onstation.org> Cc: Lina Iyer <ilina@codeaurora.org> Cc: Maulik Shah <mkshah@codeaurora.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20200121183748.68662-1-swboyd@chromium.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit c34f6dc ] I see the following lockdep splat in the qcom pinctrl driver when attempting to suspend the device. ============================================ WARNING: possible recursive locking detected 5.4.2 #2 Tainted: G S -------------------------------------------- cat/6536 is trying to acquire lock: ffffff814787ccc0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 but task is already holding lock: ffffff81436740c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by cat/6536: #0: ffffff8140e0c420 (sb_writers#7){.+.+}, at: vfs_write+0xc8/0x19c #1: ffffff8121eec480 (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x1f4 #2: ffffff8147cad668 (kn->count#263){.+.+}, at: kernfs_fop_write+0x130/0x1f4 #3: ffffffd011446000 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x354 #4: ffffff814302b970 (&dev->mutex){....}, at: __device_suspend+0x16c/0x420 #5: ffffff81436740c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94 #6: ffffff81479b8c10 (&pctrl->lock){....}, at: msm_gpio_irq_set_wake+0x48/0x7c stack backtrace: CPU: 4 PID: 6536 Comm: cat Tainted: G S 5.4.2 #2 Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xdc/0x144 __lock_acquire+0x52c/0x2268 lock_acquire+0x1dc/0x220 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 msm_gpio_irq_set_wake+0x5c/0x7c set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 cros_ec_rtc_suspend+0x38/0x4c platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x314/0x420 dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x628 pm_suspend+0x214/0x354 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x158/0x1f4 __vfs_write+0x54/0x18c vfs_write+0xdc/0x19c ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x10 This is because the msm_gpio_irq_set_wake() function calls irq_set_irq_wake() as a backup in case the irq comes in during the path to idle. Given that we're calling irqchip functions from within an irqchip we need to set the lockdep class to be different for this child controller vs. the default one that the parent irqchip gets. This used to be done before this driver was converted to hierarchical irq domains in commit e35a6ae ("pinctrl/msm: Setup GPIO chip in hierarchy") via the gpiochip_irq_map() function. With hierarchical irq domains this function has been replaced by gpiochip_hierarchy_irq_domain_alloc(). Therefore, set the lockdep class like was done previously in the irq domain path so we can avoid this lockdep warning. Fixes: fdd61a0 ("gpio: Add support for hierarchical IRQ domains") Cc: Thierry Reding <treding@nvidia.com> Cc: Brian Masney <masneyb@onstation.org> Cc: Lina Iyer <ilina@codeaurora.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20200114231103.85641-1-swboyd@chromium.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 35df429 upstream. EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4] write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127: ext4_write_end+0x4e3/0x750 [ext4] ext4_update_i_disksize at fs/ext4/ext4.h:3032 (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046 (inlined by) ext4_write_end at fs/ext4/inode.c:1287 generic_perform_write+0x208/0x2a0 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37: ext4_writepages+0x10ac/0x1d00 [ext4] mpage_map_and_submit_extent at fs/ext4/inode.c:2468 (inlined by) ext4_writepages at fs/ext4/inode.c:2772 do_writepages+0x5e/0x130 __writeback_single_inode+0xeb/0xb20 writeback_sb_inodes+0x429/0x900 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x4bd/0x870 wb_workfn+0x6b4/0x960 process_one_work+0x54c/0xbe0 worker_thread+0x80/0x650 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 Reported by Kernel Concurrency Sanitizer on: CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: writeback wb_workfn (flush-7:0) Since only the read is operating as lockless (outside of the "i_data_sem"), load tearing could introduce a logic bug. Fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the write. Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Orabug: 30944736 Shrinker hold sb->s_umount lock and invoked .destroy_inode to reclaim inode, if fs was freezed, Shrinker would hung by freeze lock. But unfreeze could never happen because it would be hung by sb->s_umount. Backgroud inode inactivation feature could fix this, but it was not merged by mainline yet, according Darrick, even merged, it would be nearly impossible to backport to 4.14. The effort here is to make a one OFF-MAINLINE fix for uek, if future uek have that feature merged, this patch should be dropped. To avoid deadlock, add inode needing inactivation to list and destroy them async. crash7latest> set 132 PID: 132 COMMAND: "kswapd0:0" TASK: ffff9cdc9dfb5f00 [THREAD_INFO: ffff9cdc9dfb5f00] CPU: 6 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 132 TASK: ffff9cdc9dfb5f00 CPU: 6 COMMAND: "kswapd0:0" #0 [ffffaa5d075bf900] __schedule at ffffffff8186487c #1 [ffffaa5d075bf998] schedule at ffffffff81864e96 #2 [ffffaa5d075bf9b0] rwsem_down_read_failed at ffffffff818689ee #3 [ffffaa5d075bfa40] call_rwsem_down_read_failed at ffffffff81859308 #4 [ffffaa5d075bfa90] __percpu_down_read at ffffffff810ebd38 #5 [ffffaa5d075bfab0] __sb_start_write at ffffffff812859ef #6 [ffffaa5d075bfad0] xfs_trans_alloc at ffffffffc07ebe9c [xfs] #7 [ffffaa5d075bfb18] xfs_free_eofblocks at ffffffffc07c39d1 [xfs] #8 [ffffaa5d075bfb80] xfs_inactive at ffffffffc07de878 [xfs] #9 [ffffaa5d075bfba0] __dta_xfs_fs_destroy_inode_3543 at ffffffffc07e885e [xfs] #10 [ffffaa5d075bfbd0] destroy_inode at ffffffff812a25de #11 [ffffaa5d075bfbe8] evict at ffffffff812a2b73 #12 [ffffaa5d075bfc10] dispose_list at ffffffff812a2c1d #13 [ffffaa5d075bfc38] prune_icache_sb at ffffffff812a421a #14 [ffffaa5d075bfc70] super_cache_scan at ffffffff812870a1 #15 [ffffaa5d075bfcc8] shrink_slab at ffffffff811eebb3 #16 [ffffaa5d075bfdb0] shrink_node at ffffffff811f4788 #17 [ffffaa5d075bfe38] kswapd at ffffffff811f58c3 #18 [ffffaa5d075bff08] kthread at ffffffff810b75d5 #19 [ffffaa5d075bff50] ret_from_fork at ffffffff81a0035e crash7latest> set 31060 PID: 31060 COMMAND: "safefreeze" TASK: ffff9cd292868000 [THREAD_INFO: ffff9cd292868000] CPU: 2 STATE: TASK_UNINTERRUPTIBLE crash7latest> bt PID: 31060 TASK: ffff9cd292868000 CPU: 2 COMMAND: "safefreeze" #0 [ffffaa5d10047c90] __schedule at ffffffff8186487c #1 [ffffaa5d10047d28] schedule at ffffffff81864e96 #2 [ffffaa5d10047d40] rwsem_down_write_failed at ffffffff81868f18 #3 [ffffaa5d10047dd8] call_rwsem_down_write_failed at ffffffff81859367 #4 [ffffaa5d10047e20] down_write at ffffffff81867cfd #5 [ffffaa5d10047e38] thaw_super at ffffffff81285d2d #6 [ffffaa5d10047e60] do_vfs_ioctl at ffffffff81299566 #7 [ffffaa5d10047ee8] sys_ioctl at ffffffff81299709 #8 [ffffaa5d10047f28] do_syscall_64 at ffffffff81003949 #9 [ffffaa5d10047f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad RIP: 0000000000453d67 RSP: 00007ffff9c1ce78 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000001cbe92c RCX: 0000000000453d67 RDX: 0000000000000000 RSI: 00000000c0045878 RDI: 0000000000000014 RBP: 00007ffff9c1cf80 R8: 0000000000000000 R9: 0000000000000012 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000401fb0 R13: 0000000000402040 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>

[ Upstream commit 42ffb0b ] There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: b293f02 ("Btrfs: Add writepages support") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 35df429 upstream. EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4] write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127: ext4_write_end+0x4e3/0x750 [ext4] ext4_update_i_disksize at fs/ext4/ext4.h:3032 (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046 (inlined by) ext4_write_end at fs/ext4/inode.c:1287 generic_perform_write+0x208/0x2a0 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37: ext4_writepages+0x10ac/0x1d00 [ext4] mpage_map_and_submit_extent at fs/ext4/inode.c:2468 (inlined by) ext4_writepages at fs/ext4/inode.c:2772 do_writepages+0x5e/0x130 __writeback_single_inode+0xeb/0xb20 writeback_sb_inodes+0x429/0x900 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x4bd/0x870 wb_workfn+0x6b4/0x960 process_one_work+0x54c/0xbe0 worker_thread+0x80/0x650 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 Reported by Kernel Concurrency Sanitizer on: CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: writeback wb_workfn (flush-7:0) Since only the read is operating as lockless (outside of the "i_data_sem"), load tearing could introduce a logic bug. Fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the write. Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d #17 [ffffc9002219fc90] schedule at ffffffff81a3e219 #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> #1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit a866759 ] This reverts commit 64e62bd. This commit ends up causing some lockdep splats due to trying to grab the payload lock while holding the mgr's lock: [ 54.010099] [ 54.011765] ====================================================== [ 54.018670] WARNING: possible circular locking dependency detected [ 54.025577] 5.5.0-rc6-02274-g77381c23ee63 #47 Not tainted [ 54.031610] ------------------------------------------------------ [ 54.038516] kworker/1:6/1040 is trying to acquire lock: [ 54.044354] ffff888272af3228 (&mgr->payload_lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.054957] [ 54.054957] but task is already holding lock: [ 54.061473] ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.071193] [ 54.071193] which lock already depends on the new lock. [ 54.071193] [ 54.080334] [ 54.080334] the existing dependency chain (in reverse order) is: [ 54.088697] [ 54.088697] -> #1 (&mgr->lock){+.+.}: [ 54.094440] __mutex_lock+0xc3/0x498 [ 54.099015] drm_dp_mst_topology_get_port_validated+0x25/0x80 [ 54.106018] drm_dp_update_payload_part1+0xa2/0x2e2 [ 54.112051] intel_mst_pre_enable_dp+0x144/0x18f [ 54.117791] intel_encoders_pre_enable+0x63/0x70 [ 54.123532] hsw_crtc_enable+0xa1/0x722 [ 54.128396] intel_update_crtc+0x50/0x194 [ 54.133455] skl_commit_modeset_enables+0x40c/0x540 [ 54.139485] intel_atomic_commit_tail+0x5f7/0x130d [ 54.145418] intel_atomic_commit+0x2c8/0x2d8 [ 54.150770] drm_atomic_helper_set_config+0x5a/0x70 [ 54.156801] drm_mode_setcrtc+0x2ab/0x833 [ 54.161862] drm_ioctl+0x2e5/0x424 [ 54.166242] vfs_ioctl+0x21/0x2f [ 54.170426] do_vfs_ioctl+0x5fb/0x61e [ 54.175096] ksys_ioctl+0x55/0x75 [ 54.179377] __x64_sys_ioctl+0x1a/0x1e [ 54.184146] do_syscall_64+0x5c/0x6d [ 54.188721] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.194946] [ 54.194946] -> #0 (&mgr->payload_lock){+.+.}: [ 54.201463] [ 54.201463] other info that might help us debug this: [ 54.201463] [ 54.210410] Possible unsafe locking scenario: [ 54.210410] [ 54.217025] CPU0 CPU1 [ 54.222082] ---- ---- [ 54.227138] lock(&mgr->lock); [ 54.230643] lock(&mgr->payload_lock); [ 54.237742] lock(&mgr->lock); [ 54.244062] lock(&mgr->payload_lock); [ 54.248346] [ 54.248346] *** DEADLOCK *** [ 54.248346] [ 54.254959] 7 locks held by kworker/1:6/1040: [ 54.259822] #0: ffff888275c4f528 ((wq_completion)events){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.269451] #1: ffffc9000119beb0 ((work_completion)(&(&dev_priv->hotplug.hotplug_work)->work)){+.+.}, at: worker_thread+0x455/0x6e2 [ 54.282768] #2: ffff888272a403f0 (&dev->mode_config.mutex){+.+.}, at: i915_hotplug_work_func+0x4b/0x2be [ 54.293368] #3: ffffffff824fc6c0 (drm_connector_list_iter){.+.+}, at: i915_hotplug_work_func+0x17e/0x2be [ 54.304061] #4: ffffc9000119bc58 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x40/0xfd [ 54.314855] #5: ffff888272a40470 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x74/0xe2 [ 54.324385] #6: ffff888272af3060 (&mgr->lock){+.+.}, at: drm_dp_mst_topology_mgr_set_mst+0x3c/0x2e4 [ 54.334597] [ 54.334597] stack backtrace: [ 54.339464] CPU: 1 PID: 1040 Comm: kworker/1:6 Not tainted 5.5.0-rc6-02274-g77381c23ee63 #47 [ 54.348893] Hardware name: Google Fizz/Fizz, BIOS Google_Fizz.10139.39.0 01/04/2018 [ 54.357451] Workqueue: events i915_hotplug_work_func [ 54.362995] Call Trace: [ 54.365724] dump_stack+0x71/0x9c [ 54.369427] check_noncircular+0x91/0xbc [ 54.373809] ? __lock_acquire+0xc9e/0xf66 [ 54.378286] ? __lock_acquire+0xc9e/0xf66 [ 54.382763] ? lock_acquire+0x175/0x1ac [ 54.387048] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.393177] ? __mutex_lock+0xc3/0x498 [ 54.397362] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.403492] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.409620] ? drm_dp_dpcd_access+0xd9/0x101 [ 54.414390] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.420517] ? drm_dp_mst_topology_mgr_set_mst+0x218/0x2e4 [ 54.426645] ? intel_digital_port_connected+0x34d/0x35c [ 54.432482] ? intel_dp_detect+0x227/0x44e [ 54.437056] ? ww_mutex_lock+0x49/0x9a [ 54.441242] ? drm_helper_probe_detect_ctx+0x75/0xfd [ 54.446789] ? intel_encoder_hotplug+0x4b/0x97 [ 54.451752] ? intel_ddi_hotplug+0x61/0x2e0 [ 54.456423] ? mark_held_locks+0x53/0x68 [ 54.460803] ? _raw_spin_unlock_irqrestore+0x3a/0x51 [ 54.466347] ? lockdep_hardirqs_on+0x187/0x1a4 [ 54.471310] ? drm_connector_list_iter_next+0x89/0x9a [ 54.476953] ? i915_hotplug_work_func+0x206/0x2be [ 54.482208] ? worker_thread+0x4d5/0x6e2 [ 54.486587] ? worker_thread+0x455/0x6e2 [ 54.490966] ? queue_work_on+0x64/0x64 [ 54.495151] ? kthread+0x1e9/0x1f1 [ 54.498946] ? queue_work_on+0x64/0x64 [ 54.503130] ? kthread_unpark+0x5e/0x5e [ 54.507413] ? ret_from_fork+0x3a/0x50 The proper fix for this is probably cleanup the VCPI allocations when we're enabling the topology, or on the first payload allocation. For now though, let's just revert. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 64e62bd ("drm/dp_mst: Remove VCPI while disabling topology mgr") Cc: Sean Paul <sean@poorly.run> Cc: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20200117205149.97262-1-lyude@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>

A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit d0a255e) Orabug: 31292386 Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

A hang was observed in the fcport delete path when the device was responding slow and an issue-lip path (results in session termination) was taken. Fix this by issuing logo requests unconditionally. PID: 19491 TASK: ffff8e23e67bb150 CPU: 0 COMMAND: "kworker/0:0" #0 [ffff8e2370297bf8] __schedule at ffffffffb4f7dbb0 #1 [ffff8e2370297c88] schedule at ffffffffb4f7e199 #2 [ffff8e2370297c98] schedule_timeout at ffffffffb4f7ba68 #3 [ffff8e2370297d40] msleep at ffffffffb48ad9ff #4 [ffff8e2370297d58] qlt_free_session_done at ffffffffc0c32052 [qla2xxx] #5 [ffff8e2370297e20] process_one_work at ffffffffb48bcfdf #6 [ffff8e2370297e68] worker_thread at ffffffffb48bdca6 #7 [ffff8e2370297ec8] kthread at ffffffffb48c4f81 Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit f00b342) Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Orabug: 30846289 Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

A hang was observed in the fcport delete path when the device was responding slow and an issue-lip path (results in session termination) was taken. Fix this by issuing logo requests unconditionally. PID: 19491 TASK: ffff8e23e67bb150 CPU: 0 COMMAND: "kworker/0:0" #0 [ffff8e2370297bf8] __schedule at ffffffffb4f7dbb0 #1 [ffff8e2370297c88] schedule at ffffffffb4f7e199 #2 [ffff8e2370297c98] schedule_timeout at ffffffffb4f7ba68 #3 [ffff8e2370297d40] msleep at ffffffffb48ad9ff #4 [ffff8e2370297d58] qlt_free_session_done at ffffffffc0c32052 [qla2xxx] #5 [ffff8e2370297e20] process_one_work at ffffffffb48bcfdf #6 [ffff8e2370297e68] worker_thread at ffffffffb48bdca6 #7 [ffff8e2370297ec8] kthread at ffffffffb48c4f81 Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit f00b342) Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Orabug: 30846292 Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

[ Upstream commit 37d0ec3 ] Patch series "lib/sort & lib/list_sort: faster and smaller", v2. Because CONFIG_RETPOLINE has made indirect calls much more expensive, I thought I'd try to reduce the number made by the library sort functions. The first three patches apply to lib/sort.c. Patch #1 is a simple optimization. The built-in swap has special cases for aligned 4- and 8-byte objects. But those are almost never used; most calls to sort() work on larger structures, which fall back to the byte-at-a-time loop. This generalizes them to aligned *multiples* of 4 and 8 bytes. (If nothing else, it saves an awful lot of energy by not thrashing the store buffers as much.) Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice simple solid heapsort is preferable to more complex algorithms (sorry, Andrey), but it's possible to implement heapsort with far fewer comparisons (50% asymptotically, 25-40% reduction for realistic sizes) than the way it's been done up to now. And with some care, the code ends up smaller, as well. This is the "big win" patch. Patch #3 adds the same sort of indirect call bypass that has been added to the net code of late. The great majority of the callers use the builtin swap functions, so replace the indirect call to sort_func with a (highly preditable) series of if() statements. Rather surprisingly, this decreased code size, as the swap functions were inlined and their prologue & epilogue code eliminated. lib/list_sort.c is a bit trickier, as merge sort is already close to optimal, and we don't want to introduce triumphs of theory over practicality like the Ford-Johnson merge-insertion sort. Patch #4, without changing the algorithm, chops 32% off the code size and removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding upper limit on efficiently sortable input size). Patch #5 improves the algorithm. The previous code is already optimal for power-of-two (or slightly smaller) size inputs, but when the input size is just over a power of 2, there's a very unbalanced final merge. There are, in the literature, several algorithms which solve this, but they all depend on the "breadth-first" merge order which was replaced by commit 835cc0c with a more cache-friendly "depth-first" order. Some hard thinking came up with a depth-first algorithm which defers merges as little as possible while avoiding bad merges. This saves 0.2*n compares, averaged over all sizes. The code size increase is minimal (64 bytes on x86-64, reducing the net savings to 26%), but the comments expanded significantly to document the clever algorithm. TESTING NOTES: I have some ugly user-space benchmarking code which I used for testing before moving this code into the kernel. Shout if you want a copy. I'm running this code right now, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. This patch (of 5): Rather than having special-case swap functions for 4- and 8-byte objects, special-case aligned multiples of 4 or 8 bytes. This speeds up most users of sort() by avoiding fallback to the byte copy loop. Despite what ca96ab8 ("lib/sort: Add 64 bit swap function") claims, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) The functions also got renamed to reflect the fact that they support multiple words. In the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) With feedback from Andy Shevchenko, Rasmus Villemoes and Geert Uytterhoeven. Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <lkml@sdf.org> Acked-by: Andrey Abramov <st5pub@yandex.ru> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Daniel Wagner <daniel.wagner@siemens.com> Cc: Don Mullis <don.mullis@gmail.com> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 37d0ec3) Orabug: 31314977 Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

commit 2b86cb8 upstream. Be there a platform with the following layout: Regular NIC | +----> DSA master for switch port | +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit 1a33e10 ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] *** DEADLOCK *** [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit ab92d68 ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: ab92d68 ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 735a02f ] When queuing parameters fails, current code bails out without deleting the corresponding vb2 buffer from the driver buffer list, but the buffer is returned to vb2. This leads to stale list entries and a crash when the driver stops streaming: [ 224.935561] ipu3-imgu 0000:00:05.0: set parameters failed. [ 224.998932] ipu3-imgu 0000:00:05.0: set parameters failed. [ 225.064430] ipu3-imgu 0000:00:05.0: set parameters failed. [ 225.128534] ipu3-imgu 0000:00:05.0: set parameters failed. [ 225.194945] ipu3-imgu 0000:00:05.0: set parameters failed. [ 225.360363] ------------[ cut here ]------------ [ 225.360372] WARNING: CPU: 0 PID: 6704 at drivers/media/common/videobuf2/videobuf2-core.c:927 vb2_buffer_done+0x20f/0x21a [videobuf2_common] [ 225.360374] Modules linked in: snd_seq_dummy snd_seq snd_seq_device veth bridge stp llc tun nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 ip6t_REJECT ip6t_ipv6header cmac rfcomm uinput ipu3_imgu(C) ipu3_cio2 iova videobuf2_v4l2 videobuf2_common videobuf2_dma_sg videobuf2_memops ov13858 ov5670 v4l2_fwnode dw9714 acpi_als xt_MASQUERADE fuse iio_trig_sysfs cros_ec_sensors_ring cros_ec_light_prox cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf industrialio cros_ec_sensorsupport cdc_ether btusb btrtl btintel btbcm usbnet bluetooth ecdh_generic ecc hid_google_hammer iwlmvm iwl7000_mac80211 r8152 mii lzo_rle lzo_compress iwlwifi zram cfg80211 joydev [ 225.360400] CPU: 0 PID: 6704 Comm: CameraDeviceOps Tainted: G C 5.4.30 #5 [ 225.360402] Hardware name: HP Soraka/Soraka, BIOS Google_Soraka.10431.106.0 12/03/2019 [ 225.360405] RIP: 0010:vb2_buffer_done+0x20f/0x21a [videobuf2_common] [ 225.360408] Code: 5e 41 5f 5d e9 e0 16 5a d4 41 8b 55 08 48 c7 c7 8f 8b 5c c0 48 c7 c6 36 9a 5c c0 44 89 f9 31 c0 e8 a5 1c 5b d4 e9 53 fe ff ff <0f> 0b eb a3 e8 12 d7 43 d4 eb 97 0f 1f 44 00 00 55 48 89 e5 41 56 [ 225.360410] RSP: 0018:ffff9468ab32fba8 EFLAGS: 00010297 [ 225.360412] RAX: ffff8aa7a51577a8 RBX: dead000000000122 RCX: ffff8aa7a51577a8 [ 225.360414] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8aa7a5157400 [ 225.360416] RBP: ffff9468ab32fbd8 R08: ffff8aa64e47e600 R09: 0000000000000000 [ 225.360418] R10: 0000000000000000 R11: ffffffffc06036e6 R12: dead000000000100 [ 225.360420] R13: ffff8aa7820f1940 R14: ffff8aa7a51577a8 R15: 0000000000000006 [ 225.360422] FS: 00007c1146ffd700(0000) GS:ffff8aa7baa00000(0000) knlGS:0000000000000000 [ 225.360424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 225.360426] CR2: 00007aea3473a000 CR3: 00000000537d6004 CR4: 00000000003606f0 [ 225.360427] Call Trace: [ 225.360434] imgu_return_all_buffers+0x6f/0x8e [ipu3_imgu] [ 225.360438] imgu_vb2_stop_streaming+0xd6/0xf0 [ipu3_imgu] [ 225.360441] __vb2_queue_cancel+0x33/0x22d [videobuf2_common] [ 225.360443] vb2_core_streamoff+0x16/0x78 [videobuf2_common] [ 225.360448] __video_do_ioctl+0x33d/0x42a [ 225.360452] video_usercopy+0x34a/0x615 [ 225.360455] ? video_ioctl2+0x16/0x16 [ 225.360458] v4l2_ioctl+0x46/0x53 [ 225.360462] do_vfs_ioctl+0x50a/0x787 [ 225.360465] ksys_ioctl+0x58/0x83 [ 225.360468] __x64_sys_ioctl+0x1a/0x1e [ 225.360470] do_syscall_64+0x54/0x68 [ 225.360474] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 225.360476] RIP: 0033:0x7c118030f497 [ 225.360479] Code: 8a 66 90 48 8b 05 d1 d9 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 d9 2b 00 f7 d8 64 89 01 48 [ 225.360480] RSP: 002b:00007c1146ffa5a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 225.360483] RAX: ffffffffffffffda RBX: 00007c1140010018 RCX: 00007c118030f497 [ 225.360484] RDX: 00007c114001019c RSI: 0000000040045613 RDI: 000000000000004c [ 225.360486] RBP: 00007c1146ffa700 R08: 00007c1140010048 R09: 0000000000000000 [ 225.360488] R10: 0000000000000000 R11: 0000000000000246 R12: 00007c11400101b0 [ 225.360489] R13: 00007c1140010200 R14: 00007c1140010048 R15: 0000000000000001 [ 225.360492] ---[ end trace 73625ecfbd1c930e ]--- [ 225.360498] general protection fault: 0000 [#1] PREEMPT SMP PTI [ 225.360501] CPU: 0 PID: 6704 Comm: CameraDeviceOps Tainted: G WC 5.4.30 #5 [ 225.360502] Hardware name: HP Soraka/Soraka, BIOS Google_Soraka.10431.106.0 12/03/2019 [ 225.360505] RIP: 0010:imgu_return_all_buffers+0x52/0x8e [ipu3_imgu] [ 225.360507] Code: d4 49 8b 85 70 0a 00 00 49 81 c5 70 0a 00 00 49 39 c5 74 3b 49 bc 00 01 00 00 00 00 ad de 49 8d 5c 24 22 4c 8b 30 48 8b 48 08 <49> 89 4e 08 4c 89 31 4c 89 20 48 89 58 08 48 8d b8 58 fc ff ff 44 [ 225.360509] RSP: 0018:ffff9468ab32fbe8 EFLAGS: 00010293 [ 225.360511] RAX: ffff8aa7a51577a8 RBX: dead000000000122 RCX: dead000000000122 [ 225.360512] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8aa7a5157400 [ 225.360514] RBP: ffff9468ab32fc18 R08: ffff8aa64e47e600 R09: 0000000000000000 [ 225.360515] R10: 0000000000000000 R11: ffffffffc06036e6 R12: dead000000000100 [ 225.360517] R13: ffff8aa7820f1940 R14: dead000000000100 R15: 0000000000000006 [ 225.360519] FS: 00007c1146ffd700(0000) GS:ffff8aa7baa00000(0000) knlGS:0000000000000000 [ 225.360521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 225.360523] CR2: 00007aea3473a000 CR3: 00000000537d6004 CR4: 00000000003606f0 [ 225.360525] Call Trace: [ 225.360528] imgu_vb2_stop_streaming+0xd6/0xf0 [ipu3_imgu] [ 225.360531] __vb2_queue_cancel+0x33/0x22d [videobuf2_common] [ 225.360534] vb2_core_streamoff+0x16/0x78 [videobuf2_common] [ 225.360537] __video_do_ioctl+0x33d/0x42a [ 225.360540] video_usercopy+0x34a/0x615 [ 225.360542] ? video_ioctl2+0x16/0x16 [ 225.360546] v4l2_ioctl+0x46/0x53 [ 225.360548] do_vfs_ioctl+0x50a/0x787 [ 225.360551] ksys_ioctl+0x58/0x83 [ 225.360554] __x64_sys_ioctl+0x1a/0x1e [ 225.360556] do_syscall_64+0x54/0x68 [ 225.360559] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 225.360561] RIP: 0033:0x7c118030f497 [ 225.360563] Code: 8a 66 90 48 8b 05 d1 d9 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 d9 2b 00 f7 d8 64 89 01 48 [ 225.360565] RSP: 002b:00007c1146ffa5a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 225.360567] RAX: ffffffffffffffda RBX: 00007c1140010018 RCX: 00007c118030f497 [ 225.360569] RDX: 00007c114001019c RSI: 0000000040045613 RDI: 000000000000004c [ 225.360570] RBP: 00007c1146ffa700 R08: 00007c1140010048 R09: 0000000000000000 [ 225.360572] R10: 0000000000000000 R11: 0000000000000246 R12: 00007c11400101b0 [ 225.360574] R13: 00007c1140010200 R14: 00007c1140010048 R15: 0000000000000001 [ 225.360576] Modules linked in: snd_seq_dummy snd_seq snd_seq_device veth bridge stp llc tun nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 ip6t_REJECT ip6t_ipv6header cmac rfcomm uinput ipu3_imgu(C) ipu3_cio2 iova videobuf2_v4l2 videobuf2_common videobuf2_dma_sg videobuf2_memops ov13858 ov567 Fix this by moving the list_del() call just below the list_first_entry() call when the buffer no longer needs to be in the list. Fixes: 8ecc7c9 ("media: staging/intel-ipu3: parameter buffer refactoring") Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Bingbu Cao <bingbu.cao@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 467dc47 upstream. When doing a buffered write we always try to reserve data space for it, even when the file has the NOCOW bit set or the write falls into a file range covered by a prealloc extent. This is done both because it is expensive to check if we can do a nocow write (checking if an extent is shared through reflinks or if there's a hole in the range for example), and because when writeback starts we might actually need to fallback to COW mode (for example the block group containing the target extents was turned into RO mode due to a scrub or balance). When we are unable to reserve data space we check if we can do a nocow write, and if we can, we proceed with dirtying the pages and setting up the range for delalloc. In this case the bytes_may_use counter of the data space_info object is not incremented, unlike in the case where we are able to reserve data space (done through btrfs_check_data_free_space() which calls btrfs_alloc_data_chunk_ondemand()). Later when running delalloc we attempt to start writeback in nocow mode but we might revert back to cow mode, for example because in the meanwhile a block group was turned into RO mode by a scrub or relocation. The cow path after successfully allocating an extent ends up calling btrfs_add_reserved_bytes(), which expects the bytes_may_use counter of the data space_info object to have been incremented before - but we did not do it when the buffered write started, since there was not enough available data space. So btrfs_add_reserved_bytes() ends up decrementing the bytes_may_use counter anyway, and when the counter's current value is smaller then the size of the allocated extent we get a stack trace like the following: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 20138 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...) CPU: 0 PID: 20138 Comm: kworker/u8:15 Not tainted 5.6.0-rc7-btrfs-next-58 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-btrfs-1754) RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Code: ff ff 48 (...) RSP: 0018:ffffbda18a4b3568 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff9ca076f5d800 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9ca068470410 RBP: fffffffffffff000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9ca079d58040 R11: 0000000000000000 R12: ffff9ca068470400 R13: ffff9ca0408b2000 R14: 0000000000001000 R15: ffff9ca076f5d800 FS: 0000000000000000(0000) GS:ffff9ca07a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005605dbfe7048 CR3: 0000000138570006 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: find_free_extent+0x4a0/0x16c0 [btrfs] btrfs_reserve_extent+0x91/0x180 [btrfs] cow_file_range+0x12d/0x490 [btrfs] run_delalloc_nocow+0x341/0xa40 [btrfs] btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs] ? find_lock_delalloc_range+0x221/0x250 [btrfs] writepage_delalloc+0xe8/0x150 [btrfs] __extent_writepage+0xe8/0x4c0 [btrfs] extent_write_cache_pages+0x237/0x530 [btrfs] ? btrfs_wq_submit_bio+0x9f/0xc0 [btrfs] extent_writepages+0x44/0xa0 [btrfs] do_writepages+0x23/0x80 __writeback_single_inode+0x59/0x700 writeback_sb_inodes+0x267/0x5f0 __writeback_inodes_wb+0x87/0xe0 wb_writeback+0x382/0x590 ? wb_workfn+0x4a2/0x6c0 wb_workfn+0x4a2/0x6c0 process_one_work+0x26d/0x6a0 worker_thread+0x4f/0x3e0 ? process_one_work+0x6a0/0x6a0 kthread+0x103/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace f9f6ef8ec4cd8ec9 ]--- So to fix this, when falling back into cow mode check if space was not reserved, by testing for the bit EXTENT_NORESERVE in the respective file range, and if not, increment the bytes_may_use counter for the data space_info object. Also clear the EXTENT_NORESERVE bit from the range, so that if the cow path fails it decrements the bytes_may_use counter when clearing the delalloc range (through the btrfs_clear_delalloc_extent() callback). Fixes: 7ee9e44 ("Btrfs: check if we can nocow if we don't have data space") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…eout commit 2166e5e upstream. We always preallocate a data extent for writing a free space cache, which causes writeback to always try the nocow path first, since the free space inode has the prealloc bit set in its flags. However if the block group that contains the data extent for the space cache has been turned to RO mode due to a running scrub or balance for example, we have to fallback to the cow path. In that case once a new data extent is allocated we end up calling btrfs_add_reserved_bytes(), which decrements the counter named bytes_may_use from the data space_info object with the expection that this counter was previously incremented with the same amount (the size of the data extent). However when we started writeout of the space cache at cache_save_setup(), we incremented the value of the bytes_may_use counter through a call to btrfs_check_data_free_space() and then decremented it through a call to btrfs_prealloc_file_range_trans() immediately after. So when starting the writeback if we fallback to cow mode we have to increment the counter bytes_may_use of the data space_info again to compensate for the extent allocation done by the cow path. When this issue happens we are incorrectly decrementing the bytes_may_use counter and when its current value is smaller then the amount we try to subtract we end up with the following warning: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 657 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...) CPU: 3 PID: 657 Comm: kworker/u8:7 Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-btrfs-1591) RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs] Code: ff ff 48 (...) RSP: 0000:ffffa41608f13660 EFLAGS: 00010287 RAX: 0000000000001000 RBX: ffff9615b93ae400 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9615b96ab410 RBP: fffffffffffee000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff961585e62a40 R11: 0000000000000000 R12: ffff9615b96ab400 R13: ffff9615a1a2a000 R14: 0000000000012000 R15: ffff9615b93ae400 FS: 0000000000000000(0000) GS:ffff9615bb200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055cbbc2ae178 CR3: 0000000115794006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: find_free_extent+0x4a0/0x16c0 [btrfs] btrfs_reserve_extent+0x91/0x180 [btrfs] cow_file_range+0x12d/0x490 [btrfs] btrfs_run_delalloc_range+0x9f/0x6d0 [btrfs] ? find_lock_delalloc_range+0x221/0x250 [btrfs] writepage_delalloc+0xe8/0x150 [btrfs] __extent_writepage+0xe8/0x4c0 [btrfs] extent_write_cache_pages+0x237/0x530 [btrfs] extent_writepages+0x44/0xa0 [btrfs] do_writepages+0x23/0x80 __writeback_single_inode+0x59/0x700 writeback_sb_inodes+0x267/0x5f0 __writeback_inodes_wb+0x87/0xe0 wb_writeback+0x382/0x590 ? wb_workfn+0x4a2/0x6c0 wb_workfn+0x4a2/0x6c0 process_one_work+0x26d/0x6a0 worker_thread+0x4f/0x3e0 ? process_one_work+0x6a0/0x6a0 kthread+0x103/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace bd7c03622e0b0a52 ]--- ------------[ cut here ]------------ So fix this by incrementing the bytes_may_use counter of the data space_info when we fallback to the cow path. If the cow path is successful the counter is decremented after extent allocation (by btrfs_add_reserved_bytes()), if it fails it ends up being decremented as well when clearing the delalloc range (extent_clear_unlock_delalloc()). This could be triggered sporadically by the test case btrfs/061 from fstests. Fixes: 82d5902 ("Btrfs: Support reading/writing on disk free ino cache") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…parallel commit 6bd335b upstream. When balance and scrub are running in parallel it is possible to end up with an underflow of the bytes_may_use counter of the data space_info object, which triggers a warning like the following: [134243.793196] BTRFS info (device sdc): relocating block group 1104150528 flags data [134243.806891] ------------[ cut here ]------------ [134243.807561] WARNING: CPU: 1 PID: 26884 at fs/btrfs/space-info.h:125 btrfs_add_reserved_bytes+0x1da/0x280 [btrfs] [134243.808819] Modules linked in: btrfs blake2b_generic xor (...) [134243.815779] CPU: 1 PID: 26884 Comm: kworker/u8:8 Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 [134243.816944] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [134243.818389] Workqueue: writeback wb_workfn (flush-btrfs-108483) [134243.819186] RIP: 0010:btrfs_add_reserved_bytes+0x1da/0x280 [btrfs] [134243.819963] Code: 0b f2 85 (...) [134243.822271] RSP: 0018:ffffa4160aae7510 EFLAGS: 00010287 [134243.822929] RAX: 000000000000c000 RBX: ffff96159a8c1000 RCX: 0000000000000000 [134243.823816] RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffff96158067a810 [134243.824742] RBP: ffff96158067a800 R08: 0000000000000001 R09: 0000000000000000 [134243.825636] R10: ffff961501432a40 R11: 0000000000000000 R12: 000000000000c000 [134243.826532] R13: 0000000000000001 R14: ffffffffffff4000 R15: ffff96158067a810 [134243.827432] FS: 0000000000000000(0000) GS:ffff9615baa00000(0000) knlGS:0000000000000000 [134243.828451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [134243.829184] CR2: 000055bd7e414000 CR3: 00000001077be004 CR4: 00000000003606e0 [134243.830083] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [134243.830975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [134243.831867] Call Trace: [134243.832211] find_free_extent+0x4a0/0x16c0 [btrfs] [134243.832846] btrfs_reserve_extent+0x91/0x180 [btrfs] [134243.833487] cow_file_range+0x12d/0x490 [btrfs] [134243.834080] fallback_to_cow+0x82/0x1b0 [btrfs] [134243.834689] ? release_extent_buffer+0x121/0x170 [btrfs] [134243.835370] run_delalloc_nocow+0x33f/0xa30 [btrfs] [134243.836032] btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs] [134243.836725] ? find_lock_delalloc_range+0x221/0x250 [btrfs] [134243.837450] writepage_delalloc+0xe8/0x150 [btrfs] [134243.838059] __extent_writepage+0xe8/0x4c0 [btrfs] [134243.838674] extent_write_cache_pages+0x237/0x530 [btrfs] [134243.839364] extent_writepages+0x44/0xa0 [btrfs] [134243.839946] do_writepages+0x23/0x80 [134243.840401] __writeback_single_inode+0x59/0x700 [134243.841006] writeback_sb_inodes+0x267/0x5f0 [134243.841548] __writeback_inodes_wb+0x87/0xe0 [134243.842091] wb_writeback+0x382/0x590 [134243.842574] ? wb_workfn+0x4a2/0x6c0 [134243.843030] wb_workfn+0x4a2/0x6c0 [134243.843468] process_one_work+0x26d/0x6a0 [134243.843978] worker_thread+0x4f/0x3e0 [134243.844452] ? process_one_work+0x6a0/0x6a0 [134243.844981] kthread+0x103/0x140 [134243.845400] ? kthread_create_worker_on_cpu+0x70/0x70 [134243.846030] ret_from_fork+0x3a/0x50 [134243.846494] irq event stamp: 0 [134243.846892] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [134243.847682] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 [134243.848687] softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 [134243.849913] softirqs last disabled at (0): [<0000000000000000>] 0x0 [134243.850698] ---[ end trace bd7c03622e0b0a96 ]--- [134243.851335] ------------[ cut here ]------------ When relocating a data block group, for each extent allocated in the block group we preallocate another extent with the same size for the data relocation inode (we do it at prealloc_file_extent_cluster()). We reserve space by calling btrfs_check_data_free_space(), which ends up incrementing the data space_info's bytes_may_use counter, and then call btrfs_prealloc_file_range() to allocate the extent, which always decrements the bytes_may_use counter by the same amount. The expectation is that writeback of the data relocation inode always follows a NOCOW path, by writing into the preallocated extents. However, when starting writeback we might end up falling back into the COW path, because the block group that contains the preallocated extent was turned into RO mode by a scrub running in parallel. The COW path then calls the extent allocator which ends up calling btrfs_add_reserved_bytes(), and this function decrements the bytes_may_use counter of the data space_info object by an amount corresponding to the size of the allocated extent, despite we haven't previously incremented it. When the counter currently has a value smaller then the allocated extent we reset the counter to 0 and emit a warning, otherwise we just decrement it and slowly mess up with this counter which is crucial for space reservation, the end result can be granting reserved space to tasks when there isn't really enough free space, and having the tasks fail later in critical places where error handling consists of a transaction abort or hitting a BUG_ON(). Fix this by making sure that if we fallback to the COW path for a data relocation inode, we increment the bytes_may_use counter of the data space_info object. The COW path will then decrement it at btrfs_add_reserved_bytes() on success or through its error handling part by a call to extent_clear_unlock_delalloc() (which ends up calling btrfs_clear_delalloc_extent() that does the decrement operation) in case of an error. Test case btrfs/061 from fstests could sporadically trigger this. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 432cd2a upstream. When running relocation of a data block group while scrub is running in parallel, it is possible that the relocation will fail and abort the current transaction with an -EINVAL error: [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents [134243.999871] ------------[ cut here ]------------ [134244.000741] BTRFS: Transaction aborted (error -22) [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs] [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...) [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs] [134244.017151] Code: 48 c7 c7 (...) [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286 [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000 [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001 [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001 [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08 [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000 [134244.028024] FS: 00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000 [134244.029491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0 [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [134244.034484] Call Trace: [134244.034984] btrfs_cow_block+0x12b/0x2b0 [btrfs] [134244.035859] do_relocation+0x30b/0x790 [btrfs] [134244.036681] ? do_raw_spin_unlock+0x49/0xc0 [134244.037460] ? _raw_spin_unlock+0x29/0x40 [134244.038235] relocate_tree_blocks+0x37b/0x730 [btrfs] [134244.039245] relocate_block_group+0x388/0x770 [btrfs] [134244.040228] btrfs_relocate_block_group+0x161/0x2e0 [btrfs] [134244.041323] btrfs_relocate_chunk+0x36/0x110 [btrfs] [134244.041345] btrfs_balance+0xc06/0x1860 [btrfs] [134244.043382] ? btrfs_ioctl_balance+0x27c/0x310 [btrfs] [134244.045586] btrfs_ioctl_balance+0x1ed/0x310 [btrfs] [134244.045611] btrfs_ioctl+0x1880/0x3760 [btrfs] [134244.049043] ? do_raw_spin_unlock+0x49/0xc0 [134244.049838] ? _raw_spin_unlock+0x29/0x40 [134244.050587] ? __handle_mm_fault+0x11b3/0x14b0 [134244.051417] ? ksys_ioctl+0x92/0xb0 [134244.052070] ksys_ioctl+0x92/0xb0 [134244.052701] ? trace_hardirqs_off_thunk+0x1a/0x1c [134244.053511] __x64_sys_ioctl+0x16/0x20 [134244.054206] do_syscall_64+0x5c/0x280 [134244.054891] entry_SYSCALL_64_after_hwframe+0x49/0xbe [134244.055819] RIP: 0033:0x7f29b51c9dd7 [134244.056491] Code: 00 00 00 (...) [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7 [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003 [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000 [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0 [134244.067626] irq event stamp: 0 [134244.068202] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [134244.069351] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 [134244.070909] softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0 [134244.073432] ---[ end trace bd7c03622e0b0a99 ]--- The -EINVAL error comes from the following chain of function calls: __btrfs_cow_block() <-- aborts the transaction btrfs_reloc_cow_block() replace_file_extents() get_new_location() <-- returns -EINVAL When relocating a data block group, for each allocated extent of the block group, we preallocate another extent (at prealloc_file_extent_cluster()), associated with the data relocation inode, and then dirty all its pages. These preallocated extents have, and must have, the same size that extents from the data block group being relocated have. Later before we start the relocation stage that updates pointers (bytenr field of file extent items) to point to the the new extents, we trigger writeback for the data relocation inode. The expectation is that writeback will write the pages to the previously preallocated extents, that it follows the NOCOW path. That is generally the case, however, if a scrub is running it may have turned the block group that contains those extents into RO mode, in which case writeback falls back to the COW path. However in the COW path instead of allocating exactly one extent with the expected size, the allocator may end up allocating several smaller extents due to free space fragmentation - because we tell it at cow_file_range() that the minimum allocation size can match the filesystem's sector size. This later breaks the relocation's expectation that an extent associated to a file extent item in the data relocation inode has the same size as the respective extent pointed by a file extent item in another tree - in this case the extent to which the relocation inode poins to is smaller, causing relocation.c:get_new_location() to return -EINVAL. For example, if we are relocating a data block group X that has a logical address of X and the block group has an extent allocated at the logical address X + 128KiB with a size of 64KiB: 1) At prealloc_file_extent_cluster() we allocate an extent for the data relocation inode with a size of 64KiB and associate it to the file offset 128KiB (X + 128KiB - X) of the data relocation inode. This preallocated extent was allocated at block group Z; 2) A scrub running in parallel turns block group Z into RO mode and starts scrubing its extents; 3) Relocation triggers writeback for the data relocation inode; 4) When running delalloc (btrfs_run_delalloc_range()), we try first the NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC set in its flags. However, because block group Z is in RO mode, the NOCOW path (run_delalloc_nocow()) falls back into the COW path, by calling cow_file_range(); 5) At cow_file_range(), in the first iteration of the while loop we call btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum allocation size of 4KiB (fs_info->sectorsize). Due to free space fragmentation, btrfs_reserve_extent() ends up allocating two extents of 32KiB each, each one on a different iteration of that while loop; 6) Writeback of the data relocation inode completes; 7) Relocation proceeds and ends up at relocation.c:replace_file_extents(), with a leaf which has a file extent item that points to the data extent from block group X, that has a logical address (bytenr) of X + 128KiB and a size of 64KiB. Then it calls get_new_location(), which does a lookup in the data relocation tree for a file extent item starting at offset 128KiB (X + 128KiB - X) and belonging to the data relocation inode. It finds a corresponding file extent item, however that item points to an extent that has a size of 32KiB, which doesn't match the expected size of 64KiB, resuling in -EINVAL being returned from this function and propagated up to __btrfs_cow_block(), which aborts the current transaction. To fix this make sure that at cow_file_range() when we call the allocator we pass it a minimum allocation size corresponding the desired extent size if the inode belongs to the data relocation tree, otherwise pass it the filesystem's sector size as the minimum allocation size. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com> (cherry picked from commit e1a86db) Orabug: 31602420 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Conflicts: drivers/md/md-bitmap.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com> (cherry picked from commit e1a86db) Orabug: 31682033 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com> (cherry picked from commit e1a86db) Orabug: 31682031 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com>

[ Upstream commit ab0db04 ] When running with -o enospc_debug you can get the following splat if one of the dump_space_info's trip ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G OE ------------------------------------------------------ dd/563090 is trying to acquire lock: ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs] but task is already holding lock: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&cache->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs] find_free_extent+0x7ef/0x13b0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x340 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x530 [btrfs] btrfs_cow_block+0x106/0x210 [btrfs] commit_cowonly_roots+0x55/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&space_info->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs] btrfs_inode_rsv_release+0x4f/0x170 [btrfs] btrfs_clear_delalloc_extent+0x155/0x480 [btrfs] clear_state_bit+0x81/0x1a0 [btrfs] __clear_extent_bit+0x25c/0x5d0 [btrfs] clear_extent_bit+0x15/0x20 [btrfs] btrfs_invalidatepage+0x2b7/0x3c0 [btrfs] truncate_cleanup_page+0x47/0xe0 truncate_inode_pages_range+0x238/0x840 truncate_pagecache+0x44/0x60 btrfs_setattr+0x202/0x5e0 [btrfs] notify_change+0x33b/0x490 do_truncate+0x76/0xd0 path_openat+0x687/0xa10 do_filp_open+0x91/0x100 do_sys_openat2+0x215/0x2d0 do_sys_open+0x44/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&tree->lock#2){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 find_first_extent_bit+0x32/0x150 [btrfs] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs] __btrfs_write_out_cache+0x172/0x480 [btrfs] btrfs_write_out_cache+0x7a/0xf0 [btrfs] btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs] commit_cowonly_roots+0x245/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] close_ctree+0xf9/0x2f5 [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&ctl->tree_lock){+.+.}-{2:2}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 _raw_spin_lock+0x25/0x30 btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &ctl->tree_lock --> &space_info->lock --> &cache->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&cache->lock); lock(&space_info->lock); lock(&cache->lock); lock(&ctl->tree_lock); *** DEADLOCK *** 6 locks held by dd/563090: #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs] #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs] #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs] #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs] #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] stack backtrace: CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? wake_up_klogd.part.0+0x30/0x40 lock_acquire+0xab/0x360 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] _raw_spin_lock+0x25/0x30 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] ? start_transaction+0xe0/0x5b0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] ? ktime_get_coarse_real_ts64+0xa8/0xd0 ? trace_hardirqs_on+0x1c/0xe0 btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because we're holding the block_group->lock while trying to dump the free space cache. However we don't need this lock, we just need it to read the values for the printk, so move the free space cache dumping outside of the block group lock. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 18c850f upstream. There's long existed a lockdep splat because we open our bdev's under the ->device_list_mutex at mount time, which acquires the bd_mutex. Usually this goes unnoticed, but if you do loopback devices at all suddenly the bd_mutex comes with a whole host of other dependencies, which results in the splat when you mount a btrfs file system. ====================================================== WARNING: possible circular locking dependency detected 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted ------------------------------------------------------ systemd-journal/509 is trying to acquire lock: ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] but task is already holding lock: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (sb_pagefaults){.+.+}-{0:0}: __sb_start_write+0x13e/0x220 btrfs_page_mkwrite+0x59/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 -> #5 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x60/0x80 _copy_from_user+0x20/0xb0 get_sg_io_hdr+0x9a/0xb0 scsi_cmd_ioctl+0x1ea/0x2f0 cdrom_ioctl+0x3c/0x12b4 sr_block_ioctl+0xa4/0xd0 block_ioctl+0x3f/0x50 ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&cd->lock){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 sr_block_open+0xa2/0x180 __blkdev_get+0xdd/0x550 blkdev_get+0x38/0x150 do_dentry_open+0x16b/0x3e0 path_openat+0x3c9/0xa00 do_filp_open+0x75/0x100 do_sys_openat2+0x8a/0x140 __x64_sys_openat+0x46/0x70 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 __blkdev_get+0x6a/0x550 blkdev_get+0x85/0x150 blkdev_get_by_path+0x2c/0x70 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] open_fs_devices+0x88/0x240 [btrfs] btrfs_open_devices+0x92/0xa0 [btrfs] btrfs_mount_root+0x250/0x490 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x119/0x380 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x8c6/0xca0 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_run_dev_stats+0x36/0x420 [btrfs] commit_cowonly_roots+0x91/0x2d0 [btrfs] btrfs_commit_transaction+0x4e6/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_commit_transaction+0x48e/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}: __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 __mutex_lock+0x7b/0x820 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_pagefaults); lock(&mm->mmap_lock#2); lock(sb_pagefaults); lock(&fs_info->reloc_mutex); *** DEADLOCK *** 3 locks held by systemd-journal/509: #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs] stack backtrace: CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x134/0x150 __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] __mutex_lock+0x7b/0x820 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xb0 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] ? sched_clock+0x5/0x10 do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7fa3972fdbfe Code: Bad RIP value. Fix this by not holding the ->device_list_mutex at this point. The device_list_mutex exists to protect us from modifying the device list while the file system is running. However it can also be modified by doing a scan on a device. But this action is specifically protected by the uuid_mutex, which we are holding here. We cannot race with opening at this point because we have the ->s_mount lock held during the mount. Not having the ->device_list_mutex here is perfectly safe as we're not going to change the devices at this point. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add some comments ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com> (cherry picked from commit e1a86db) Orabug: 31682035 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Conflicts: drivers/md/md-bitmap.c Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com> (cherry picked from commit e1a86db) Orabug: 31683116 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Conflicts: drivers/md/md-bitmap.c Signed-off-by: Brian Maly <brian.maly@oracle.com>

[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 1cf043baa068a5deecaf800e9122b462ac418159)

be2net added 3 commits November 21, 2018 02:12

be2net: Fix HW stall issue in Lancer

4cd817d

Lancer HW cannot handle a TSO packet with a single segment. Disable TSO/GSO for such packets. Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Update the driver version to 12.0.0.0

c337335

gregmarsden closed this Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uek5 master driver update#5

Uek5 master driver update#5
be2net wants to merge 3 commits intooracle:uek5/masterfrom
be2net:uek5-master-DRIVER-UPDATE

be2net commented Nov 29, 2018

Uh oh!

gregmarsden commented Nov 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

be2net commented Nov 29, 2018

Uh oh!

gregmarsden commented Nov 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants